Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05,
PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but
the milestone row was still active because its only slice was a
post-migration skipped placeholder. After the previous fix routed
all-skipped milestones to pre-planning, SF ran roadmap-meeting +
plan-milestone and wrote 3 new slices on a milestone whose
contract-level work was already done — burned ~4 LLM turns on
plausibly-adjacent but unwanted re-decomposition.
Root cause: deriveStateFromDb's milestone-completion gate consults
only slice statuses (and indirectly the milestone row's own status
field). It never reads REQUIREMENTS.md to check whether the
contract is already satisfied. The slice-based view collapsed the
real signal.
Fix:
- New parseRequirementsByMilestone(content) helper in files.js:
parses REQUIREMENTS.md, groups entries by their `Primary owning
milestone` field, returns Map<id, {complete, incomplete}>.
- handleAllSlicesDone now reads REQUIREMENTS.md before its
slice-based real-work check. If a milestone has at least one
owning requirement and zero of them are incomplete, route to
completing-milestone with nextAction naming the requirement count
(so the operator can see *why* the milestone is being closed
without manually opening REQUIREMENTS.md).
- Best-effort: REQUIREMENTS.md parse failure falls through to the
existing slice-based rule. Missing file likewise — no regression
for projects that don't keep a requirements file.
Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI.
End-to-end verified by re-running sf headless query on dr-repo
M003: now reports phase=completing-milestone with the right
requirement-count message.
Tests: 5 new cases — all complete + slice skipped → completing,
some active → pre-planning, zero owning requirements falls through,
missing file falls through, all complete + real slice work still
completes. Existing 4 all-skipped-replan cases still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T01: Added integration test auto-halt-self-feedback.test.mjs that proves:
- HaltWatchdog.check() creates a self-feedback DB entry with
kind=runaway-loop:idle-halt, severity=high, blocking=true
- Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated
- Deduplication works (one entry per idle period)
- New heartbeat resets and creates a new entry for the next idle period
T02: Enhanced evidence string to include elapsedMs, iteration, and
thresholdMs explicitly (R003 actionable context requirement).
Tests: 36/36 pass across auto-halt-self-feedback,
auto-halt-watchdog-notify, and self-feedback-db suites.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
handleAllSlicesDone treated isStatusDone uniformly — "complete",
"done", AND "skipped" all counted as "milestone work is finished",
so a milestone whose only slice was skipped would advance to
phase=validating-milestone. That's wrong: a placeholder slice that
was skipped doesn't validate the milestone's success criteria, it
just clears the wedge.
Surfaced concretely in dr-repo M003 (Unified Dashboard + Pilot
Validation): I skipped the migration placeholder via the new
`sf headless skip-slice` CLI, and the next-dispatch reported
`validate-milestone M003` even though no real work had happened on
the milestone. The autonomous loop would then burn an LLM turn
running validate-milestone just to discover the obvious gap.
Fix: differentiate {complete, done} from {skipped} at the gate.
When zero slices carry real-work outcomes, route into the
pre-planning phase so the dispatcher's existing
discuss → research → plan ladder takes over. The PDD/vision is
already in the milestone row, so the planner has the purpose it
needs without operator hand-holding.
Verified end-to-end against dr-repo: `sf headless query` for M003
now reports phase=pre-planning and next dispatch
`roadmap-meeting M003` (the deep-planning entry rule fires first;
discuss/research/plan come after as artifacts land).
Tests: 4 cases — all-skipped → pre-planning, complete+skipped mix
→ validating, legacy "done" alias → validating, multiple skipped
→ pre-planning.
Resolves sf-mp73sk0m-63w88y (filed via headless feedback CLI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Memory injection telemetry:
- Move counter writes from auto-prompts.js to memory-store.js (where
getRelevantMemoriesRanked/getActiveMemoriesRanked actually fire).
- Track memory_inject_count and memory_inject_chars_total via
runtime_counters table for headless-query reporting.
State-db validation:
- handleAllSlicesDone now checks if any slice carries real work
(status=complete/done) before routing to validation.
- Milestones with all-skipped slices route to "reassess-roadmap"
instead of asking the operator to validate non-existent work.
SM client defense:
- Filter foreign-tenant memories from SM query responses even when
the server returns them (defense-in-depth).
Tests updated: memory-extraction-lifecycle, sf-db-migration,
headless-query-memory-injection, sm-client, memory-tenant-gate.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes sf-mp723nju-2cpeoc. When SM_ENABLED is on, memory retrieval from
Singularity Memory is now scoped to the current project's repoIdentity
tenant. Foreign-tenant memories are filtered client-side and the tenant
filter is sent server-side for SM servers that support it.
Key changes:
- schema v68: ADD COLUMN tenant TEXT on memories table (NULL = legacy)
- insertMemoryRow: persists tenant field on every new record
- backfillMemoryTenants / backfillMemoryTenantRows: idempotent migration
called on session_start when SM_ENABLED is set
- querySmMemories: resolves effectiveTenantId (opts.tenant > opts.tenantId
> SM_TENANT_ID); returns [] when no tenant resolved and crossTenant off
- SM_CROSS_TENANT_ENABLED=1 opt-in bypass with audit warning in console
- register-hooks session_start: calls backfillMemoryTenants when SM active
- 12 new tests in memory-tenant-gate.test.mjs; updated sm-client.test.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Project Memories section is rendered into every execute-task,
plan-slice, and research-slice prompt. At 10 memories × ~200 chars
each that's ~2K chars/turn injected into the context — real cost,
no operator-visible meter.
Adds two runtime_counters (already-existing key/value store):
memory_inject_chars_total — cumulative section size
memory_inject_count — number of injections
Written by buildProjectMemoriesSection() on every render. Both
writes sit inside a try/catch so a legacy DB without
runtime_counters silently skips rather than blocking prompt build.
`sf headless query` surfaces the cumulative + derived metrics as a
new top-level `memoryInjection` block:
{
total_chars: 12480,
count: 8,
avg_chars: 1560,
estimated_total_tokens: 3120
}
The block is omitted entirely when count is 0 (fresh project / no
prompts rendered yet) so it doesn't clutter the snapshot.
Operators can now correlate prompt size growth against autonomous
run cost without instrumenting the LLM call sites directly. The
estimated_total_tokens is chars/4 — a rough approximation since SF
doesn't tokenise the section, intentionally documented as such.
Resolves sf-mp723yl9-rcxoeh filed via the headless feedback CLI.
Tests: 5 source-level invariants — type carries the section, query
reads counters by name, snapshot omits section on zero, write side
calls both counter functions, write is wrapped in try/catch with
documented failure-mode comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Even though querySmMemories pins tenantId in the request body sent
to the Singularity Memory server, SF used to accept whatever came
back without verifying. A misconfigured or compromised SM server
could echo memories from other tenants and SF would inject them
into the next execute-task prompt — cross-customer leak.
filterSmMemoriesToTenant() now re-checks every returned memory:
- same-tenant memories pass through
- foreign-tenant memories (memory.tenantId OR memory.tenant !=
expectedTenantId) are dropped, with a one-line warning so the
misconfigured-SM symptom is visible rather than silent
- memories with no tenant claim at all default to allow — matches
the local DB's "NULL tenant = legacy row" rule from schema v68
- SM_REQUIRE_TENANT_CLAIM=true flips the legacy rule to drop
(hard fail-closed mode for operators who want it)
Defensive guards against non-array inputs, missing expectedTenantId
(returns input unchanged so caller-side fail-open semantics are
preserved), and the dual tenantId/tenant field naming.
Tests: 8 cases — same-tenant pass-through, foreign drop, legacy
allow, strict mode drop, tenantId/tenant alias, empty/non-array
defensiveness, missing-expected pass-through, warning emission.
Resolves the cross-project tenant-leak feedback row filed via the
new headless feedback CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously buildProjectMemoriesSection(`${sTitle} ${tTitle}`) sent
two short strings to the cosine ranker — too sparse for re-ranking
to do meaningful work against the static pool.
buildMemoryRetrievalQuery() (new, exported for tests) enriches the
query with:
- slice.title + task.title (original signal)
- slice.goal text, front 600 chars
(the WHY of the slice — usually
names the memory-relevant
context the title can't fit)
- top 20 changed files from
git diff/status (the WHAT — what code is in
play right now; lets cosine
ranking promote memories whose
content references those paths)
Fail-open at each source: DB closed → no goal; not a git repo →
no files; nullish title args don't poison the string. The call
site never has to handle errors.
Bounded so embedding token cost stays predictable: 600-char goal
cap, 20-file cap. Empty inputs collapse to "" so the consumer's
`if (!query.trim())` branch still picks the static fallback.
Tests: 5 cases — titles always present, non-git directory safe,
empty-input collapse, nullish-arg defensiveness, real git repo
surfaces changed file paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Enables CI and containerised deployments without writing secrets to disk.
Auth.json still takes precedence when present.
- readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var
- SF_LLM_GATEWAY_URL env var also supported for endpoint override
- Added tests for env fallback, auth.json preference, and default URL
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Self-feedback triage routing was including paid opencode models even
when the operator policy prefers the free tier. Add
isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the
candidate list before the router scores them.
Also: cosmetic — quote style normalised by the formatter on
buildInlineFixPrompt strings and spawn options object.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests were picking up the developer's real
~/.sf/agent/discovery-cache.json and seeing unexpected models in
output. Pin tests to a guaranteed-missing path via the new
_discoveryCacheFilePath option so the env they observe is solely
what the test constructs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surgical read/write access to ~/.sf/agent/auth.json without touching
the file directly. All mutations go through AuthStorage so file-lock
and chmod-600 invariants are always respected.
sf key set <provider> <api-key> add/rotate stored key
sf key get <provider> show masked key (last 4 chars)
sf key remove <provider> [--yes] remove credential
sf key list list all providers + status
Rationale: SF's source of truth for credentials is auth.json at
runtime — env vars are only used during initial one-time provider
setup. Rotation needs an explicit, audit-friendly path, not implicit
env-driven re-reads. Keys are never echoed in full (last 4 chars
only); remove always prompts unless --yes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"`
for assistant reasoning content. The old code only checked `role === "assistant"`,
so every transcript was empty → extraction silently skipped every unit.
Fix: recognise both legacy (`role === "assistant"`) and modern
(`custom_message` with `sf-*` prefix) entry shapes. Also reads the
standalone `text` field used by custom messages.
This is why memory_processed_units had 0 rows despite 34 activity logs.
Tests: 186 files / 1994 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.
Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry
This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.
Tests: 185 files / 1993 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds full coverage for the discovery-gating root cause that was
fixed in commits d70d8d3b1 (xiaomi x-api-key auth) and the
subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry
work in model-catalog-cache.js.
Diagnosis recap: kimi-coding, opencode, opencode-go were silent
in ~/.sf/agent/discovery-cache.json because the SDK's
model-discovery.js adapter registry marked them with
StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's
discoverModels() never attempted them. SF's own
scheduleModelCatalogRefresh DID fetch them but wrote only to the
per-repo runtime cache (basePath/.sf/model-catalog/) and only fired
on session_start — not during --discover. The fix is to mirror the
write to the SDK's discovery cache on both fetch-path AND cache-hit
path, and await it in cli.ts before listModels when --discover is set.
New test sections:
- parseDiscoveredModels: OpenAI {data}/{models} formats, Google
{models[].name} prefix stripping, name-as-id fallback, null on
bad input, OpenRouter pricing extraction
- refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer),
opencode uses Bearer, no-key providers skipped, SDK discovery cache
written on BOTH network-fetch and cache-hit paths, kimi-coding +
opencode-go iterated when keys present
46 tests pass. No regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trailing instrumentation from the discovery investigation. The error
catch still swallows non-fatal failures during --discover, just no
longer prints to stderr.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier commit (44fcfb643) incorrectly disabled phrase on repo-root
because I thought phrase retriever hung on full-workspace scope. After
clearing the corrupted cache (left by killing a mid-build vector process),
testing confirms:
- bm25 alone on repo root: works, 1m 50s cold, instant warm
- phrase alone on repo root: works after cache clear
- bm25+phrase on repo root: works after cache clear
- vector on scoped paths: works after cache build
The "hang" was from a corrupted/stale cache, not a sift bug.
.siftignore is properly excluding files (146K→2,660 indexed).
Revert chooseSiftRetrievers back to bm25,phrase for repo-root.
Tests: 184 files / 1974 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Today's discovery cache stored only model IDs (string[]). Downstream
isZeroCost(model?.cost) check evaluated against undefined for any
dynamically-discovered model, so OpenRouter's zero-cost-but-not-:free
entries (owl-alpha, lyria-3-pro-preview, lyria-3-clip-preview,
openrouter/free) got silently blocked by the built-in provider policy.
Cache entry shape now: {id, cost?, contextWindow?} per model.
parseDiscoveredModels extracts pricing from OpenRouter's
/api/v1/models response (pricing.prompt/completion/input_cache_read/
input_cache_write → numeric cost.{input,output,cacheRead,cacheWrite}).
Other providers stay {id}-only — their /v1/models endpoints don't
ship pricing.
Migration: on first read of a legacy string[] cache, entries are
converted in-place to {id} objects and the file is rewritten. No cost
backfill (data wasn't there before), but the new readers handle them.
Cost wired into policy: isModelAllowedByBuiltInProviderPolicy calls
lookupDiscoveredModelCost("openrouter", modelId) as a fallback when
the static model registry has no cost data.
Plus: cli.ts --discover now eagerly refreshes SF-managed providers
(opencode, opencode-go, kimi-coding, xiaomi) that the SDK's adapter
doesn't cover — so they populate cache on first --discover instead
of waiting for a session-start lazy refresh.
Tests: 13 new across 5 groups (pricing extraction, round-trip, legacy
migration, policy gate happy/sad paths, Google provider compat).
Full suite: 184 files / 1971 tests, zero regressions.
Real-world result: openrouter/owl-alpha, google/lyria-3-pro-preview,
google/lyria-3-clip-preview, openrouter/free, plus any future
zero-cost models now pass the policy filter on the next discovery
refresh.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: the sift binary's phrase retriever hangs indefinitely when
queried against the full repo-root scope (57K+ files). Earlier tests
mistook this for a general slowness, but isolated testing confirms:
- bm25 alone on repo root: works (1m 30s cold, instant warm)
- phrase alone on repo root: hangs forever
- bm25+phrase on repo root: hangs forever (phrase path blocks)
- all retrievers on scoped subdirs: work correctly
The earlier Rust panic was from a corrupted cache state left by killing
a mid-build vector process. After clearing the cache, bm25 alone works.
Fix: chooseSiftRetrievers now returns retrievers: "bm25" (not "bm25,phrase")
for repo-root scope. Scoped subdirs still get bm25+phrase+vector with
position-aware reranking.
Tests: updated 3 assertions in sift-retriever-scope.test.mjs.
Full suite: 183 files / 1958 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three providers were missing from PROVIDER_CATALOG_CONFIG so their
model lists couldn't be auto-discovered. Their wire ids only existed
in packages/ai/src/models.generated.ts as hand-coded entries, meaning
new model variants from these providers required manual catalog edits.
Verified live endpoints respond to /v1/models with bearer auth:
- opencode → https://opencode.ai/zen/v1/models (6 free models)
- opencode-go → https://opencode.ai/zen/go/v1/models (15 models)
- minimax → https://api.minimax.io/v1/models (works)
Added entries:
opencode: baseUrl https://opencode.ai/zen, modelsPath /v1/models
opencode-go: baseUrl https://opencode.ai/zen/go, modelsPath /v1/models
minimax: baseUrl https://api.minimax.io, modelsPath /v1/models
(international endpoint; Chinese-network api.minimaxi.com
still handled separately in the SDK)
Auth keys already wired: OPENCODE_API_KEY, OPENCODE_GO_API_KEY (with
OPENCODE_API_KEY fallback), MINIMAX_API_KEY. No env-api-keys.ts changes.
Combined with 385e0b448 (dynamic canonicalIdFor resolver), new model
variants from these three providers will be auto-grouped in
.sf/model-performance.json without hand-editing CANONICAL_BY_ROUTE.
Live counts after fresh discovery will reveal experimental models
absent from static catalog (e.g. opencode's "big-pickle", opencode-go's
deepseek-v4-pro, mimo-v2.5-pro, hy3-preview). The model-router
tolerates unconventional wire IDs — no naming constraints.
To populate cache: rm -rf ~/.sf/runtime/model-catalog/ + relaunch sf.
Tests: 13 new in provider-catalog-discovery.test.mjs (catalog shape,
modelsPath presence, DISCOVERABLE_PROVIDER_IDS inclusion). Full suite
183 files / 1940 tests pass, zero regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After 385e0b448 added the dynamic discovery-cache resolver to
canonicalIdFor, the 15 identity-strip aliases added in 089bf0cbe for
discovered providers became pure redundancy — the dynamic path
returns the same bare modelId from the discovery cache.
Removed (all canonical == bare modelId, all providers in discovery cache):
- minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed
- mistral/codestral-latest, mistral/devstral-2512,
mistral/devstral-small-2507, mistral/mistral-large-latest,
mistral/mistral-medium-latest, mistral/mistral-small-latest
- zai/glm-4.5, zai/glm-4.5-air, zai/glm-4.6, zai/glm-4.7,
zai/glm-5, zai/glm-5-turbo, zai/glm-5.1
Kept (real aliases — canonical differs from wire id, NOT identity strips):
- kimi-coding/kimi-for-coding → kimi-k2.6 (Moonshot alias)
- mistral/devstral-medium-2507 → devstral-medium-latest (alias to latest)
- minimax/MiniMax-M2 family lowercase mappings (case-change aliases)
Also kept:
- zai/glm-4.5-flash, zai/glm-4.7-flash (not yet in discovery cache;
flash variants may launch before cache refresh — fast-path safety)
- kimi-coding/kimi-k2.6 + kimi-k2-thinking (kimi-coding cache only
has kimi-for-coding; these resolve via _ENTRY_BY_ROUTE fallback)
Tests: 15 new regression tests in canonical-id-dynamic.test.mjs verify
each removed entry STILL resolves correctly via dynamic discovery.
Total 21/21 in that file, plus 101 model-registry tests, plus 16
canonical-id-mapping tests — all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After commit 089bf0cbe added 23 hand-written aliases for production
route keys, the right structural fix is to also consult the dynamic
model-discovery cache (~/.sf/agent/discovery-cache.json). Otherwise
every new model variant from a discovered provider (ollama-cloud +39
models, openrouter +24, etc.) requires another round of hand-editing.
canonicalIdFor now resolves in this order:
1. CANONICAL_BY_ROUTE (static fast path, retains real aliases like
kimi-coding/kimi-for-coding → kimi-k2.6 where canonical differs)
2. _ENTRY_BY_ROUTE (existing static path)
3. canonicalIdFromDiscovery — reads ~/.sf/agent/discovery-cache.json,
finds (provider, modelId) pair, returns bare modelId
In-memory cache with 60s TTL (DISCOVERY_CACHE_TTL_MS) so the readFileSync
on the hot path becomes one disk read per minute at most. canonicalIdFor
is per-dispatch, not per-token, so the overhead is negligible.
Test hook __setDiscoveryCacheForTest lets vitest inject a cache without
touching the fs.
Tests: 6 new in canonical-id-dynamic.test.mjs (dynamic hit, static-alias
wins over dynamic, cache miss → null, null cache graceful, missing-models
graceful, multiple models per provider). Combined with existing
canonical-id-mapping: 22/22 pass. Full suite 1912 pass, no regressions.
Sanity verified: canonicalIdFor("ollama-cloud/glm-5.1") → "glm-5.1"
(dynamic-only, not in static table); canonicalIdFor("unknown/never")
→ null.
Follow-up (in flight, separate agent): prune the static identity-strip
aliases from CANONICAL_BY_ROUTE for providers in the discovery cache
since they're now redundant with the dynamic resolver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Autonomous mode's model-fallback chain bypassed enabledModels — when zai
429'd, the chain happily fell through to mistral/codestral-latest even
though only minimax/*, kimi-coding/*, zai/*, ollama-cloud/* were allowed.
Of 52 dispatches in this repo's journal this session, 10 (~19%)
escaped the allowlist (mistral×2, opencode-go×3, google-gemini-cli×5).
enabledModels was honored by interactive cycling (settings-manager.ts)
and by self-feedback-drain.js for triage routing, but
auto-model-selection.js's fallback chain in selectAndApplyModel never
read it.
Now: isModelInEnabledList(provider, modelId, enabledModels) filters
each fallback candidate. Supports exact "provider/model" or
"provider/*" wildcard. Empty/undefined list = open behavior (no
regression for setups without an allowlist).
readEnabledModels reads ~/.sf/agent/settings.json once per chain;
swallows IO errors → undefined → no constraint (safe failure mode).
Escape hatch: SF_BYPASS_ENABLED_MODELS=1 disables the check for
emergency / misconfigured cases.
When ALL candidates are filtered out and the chain exhausts, throws
a clear error directing the operator to add to allowlist or unset.
Tests: 13 in enabled-models-fallback.test.mjs covering pattern matrix,
multi-candidate chain skipping, bypass env, and exhaustion path.
Full suite 1906 pass, no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Of 52 dispatches in this repo's journal this session, 51 landed in
.sf/model-performance.json's _unmapped bucket — meaning the live-outcome
learner couldn't tell which provider/model succeeded or failed. Only
1 dispatch (google-gemini-cli/gemini-3-flash-preview) bucketed correctly.
Root cause was NOT just missing aliases — it was a lazy-load race:
- model-learner.js declared canonicalIdFor as a fire-and-forget dynamic
import side-effect at module bottom
- metrics.js called recordOutcome() synchronously after
`await import("./model-learner.js")` resolved — before the registry
injection promise settled
- Result: _canonicalIdForFn was null for the first dispatch every session.
Every session. Since the file shipped.
Why nobody noticed: _unmapped is a bucket, not an error. No throw, no
warning, no UI surface. Selection still worked because benchmark-selector
+ static hand-tuned scores carry the routing decision. Only the
feedback loop (recordOutcome → adjust scores) was silently severed.
Fix:
- model-learner.js: export `registryReady` promise instead of swallowing it
- metrics.js: await registryReady before recordOutcome()
- model-registry.ts: 23 new CANONICAL_BY_ROUTE entries covering the actual
production fallback chain — zai/glm-4.5{-air,-flash,5,5.1,5-turbo,4.6,4.7,4.7-flash},
mistral/codestral-latest + devstral-2512 + devstral-{small,medium}-* +
mistral-{large,medium,small}-latest, google-gemini-cli/gemini-{2.5-pro,3-flash-preview,3.1-pro-preview},
opencode-go/{glm-5,glm-5.1,mimo-v2-omni,mimo-v2-pro}
Also adds opt-in backfillModelPerformanceFromJournal(basePath) to
reclassify the existing 51 _unmapped records from past journal events.
Never auto-runs; backs up the old file before overwriting.
Tests: 16 in canonical-id-mapping.test.mjs covering pattern matching,
non-mappable cases, bare canonical-id passthrough, and the backfill
path. Full suite 1906 pass, no regressions.
Known follow-up: CANONICAL_BY_ROUTE uses mixed casing (MiniMax-M2.7 vs
minimax-m2) — should be standardized lowercase in a future pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The swarm dispatch path is default in headless (ea8a3d935) but the
journal didn't tag events with which dispatch path was used. Result:
grep "swarm" .sf/journal/*.jsonl returned zero hits across this repo,
~/code/dr-repo, ~/code/centralcloud/dr — even where swarm IS running.
Cross-repo telemetry was blind to swarm adoption.
Now both swarm dispatch sites emit a journal event per call:
runUnitViaSwarm (auto/run-unit.js):
- success: outcome from worker checkpoint or "continue", via "autonomous-unit"
- no-reply: outcome "no-reply" with error field
- throw: outcome "error" with error field
runSingleAgentViaSwarm (subagent/index.js):
- success: outcome "agent-reply", via "subagent-extension", agentName
- no-reply / catch: same outcome scheme as run-unit
Event shape:
{
ts, eventType: "swarm-dispatch",
data: { unitType, unitId, targetAgent, workMode, toolCallCount,
outcome, via, agentName?, error? }
}
All six emitJournalEvent calls wrapped in try/catch — journal write
failure must not break dispatch (mirrors crash-recovery.js pattern).
Tests: 68 new assertions across the two files (5 + 4 test groups
covering happy path, no-reply, throw). Full suite 1872 pass, no
regressions.
Once landed everywhere this enables:
- grep swarm-dispatch .sf/journal/*.jsonl shows adoption
- ~/.sf/agent/upstream-feedback.jsonl rolls up swarm vs legacy ratio
- "is this repo using swarms?" becomes a one-line query
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, sift warmup only ran during sf init/auto-start, which meant
repos launched via sf headless or entered mid-session never got their
index built. The first sift_search/codebase_search call would then block
for minutes while the cold cache was built.
Now autoLoop() calls ensureSiftIndexWarmup() at loop entry. The warmup
runs detached (background process) and is skipped if already running or
if a recent marker exists. This ensures every repo SF operates on gets
indexed regardless of entry path.
- Best-effort: wrapped in try/catch so warmup failures never block the loop
- Lazy import to avoid circular dependencies
- Debug-logged for observability
Tests: 179 files / 1863 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 2 (216b1d43f) wrote "# generated from .sf/sf.db ..." as line 1 of
.sf/self-feedback.jsonl. readJsonl tolerated it via try/catch around
JSON.parse, but the doctor's stricter JSONL syntax check flagged it as
"invalid jsonl syntax: line 1: Unexpected token '#'".
Replace the # comment with a JSON-valid meta marker:
{"_meta":"generated from .sf/sf.db","_warning":"do not edit directly; use the resolve_issue tool or sf headless triage --apply"}
readJsonl now skips entries carrying `_meta` so downstream consumers
don't see the marker as a self-feedback record. Tests updated to match
the new marker shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the DB-first planning state migration (proposal f3571475d,
Phase 1 ec65b4d88 covered VALIDATION.md). Same approach for self-feedback:
DB is canonical; .sf/self-feedback.jsonl and .sf/SELF-FEEDBACK.md are
projections regenerated from DB.
Solves a real pain: 4 self-feedback entries were stuck visible in
sf headless triage --list because the resolution path (markResolved)
read JSONL while the entries lived only in DB after autonomous wrote
them through the structured ledger. Hand-edit fixes were obsolete-bound
under the divergent-stores design.
markResolved (self-feedback.js:870-940): success branch now calls
regenerateSelfFeedbackJsonl + regenerateSelfFeedbackMarkdown after the
DB write (resolveSelfFeedbackEntry), replacing the
appendResolutionToJsonl + regenerate-markdown sequence. Legacy in-place
JSONL rewrite path retained only for !isForgeRepo (upstream log).
New helpers:
- regenerateSelfFeedbackJsonl(basePath): writes JSONL from DB via
listSelfFeedbackEntries(); first line is "# generated from .sf/sf.db
— do not edit directly; use the resolve_issue tool" (readJsonl
already tolerates non-JSON lines via try/catch in JSON.parse, no
parser change needed)
- backfillSelfFeedbackJsonl(basePath): calls importLegacyJsonlToDb
then regenerateSelfFeedbackJsonl; idempotent and exact-byte stable
on repeated calls
Bootstrap (register-hooks.js): backfillSelfFeedbackJsonl runs on every
session start before compactSelfFeedbackMarkdown. No-op when DB
unavailable.
DB schema unchanged: acceptanceCriteria lives in full_json column and
is surfaced via rowToSelfFeedback's ...parsed spread; markResolved's
AC-file-touch verification works without change.
Tests: 6 new in self-feedback-db.test.mjs (DB-only entry resolves
without JSONL, both projections reflect resolution, backfill idempotent
+ byte-stable, generated-header present, 4 flagged entries resolve
cleanly via the new path). 28 tests in the file pass; full suite
179 files / 1863 tests pass, no regressions.
Live verification: backfillSelfFeedbackJsonl ran against production
.sf/sf.db; all 50 DB entries now in JSONL including the 4 previously
stuck entries — resolve_issue calls for them now succeed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three improvements to sift diagnostics:
1. --verbose flag: When SF_SIFT_LOG_LEVEL=debug|trace, sift search
calls now include --verbose for richer stderr output from the Rust
binary. Applied to sift_search, codebase_search, and warmup paths.
2. Vector-index progress poller: During searches that include the
'vector' retriever, a 30-second interval polls the global sift cache
(~/.cache/sift/search/artifacts/indexes/*/sectors/) and writes
progress lines to the log file:
[2026-05-15T11:00:00Z] vector-index progress: 32 sectors (80 MB total)
This lets an operator tail the log during long cold-cache embedding
builds instead of staring at a silent process.
3. estimateVectorIndexProgress / countVectorSectors helpers count sector
files across all index directories and report total count + size.
Tests: 179 files / 1858 tests pass.
Type check: clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chooseSiftRetrievers returned reranking: 'rerank' which is not a valid
sift CLI value. Valid values are: none, position-aware, llm, jina, gemma.
This caused vector searches to fail with 'invalid value for --reranking'.
Fix: use 'position-aware' for scoped subdir searches. This is the
structural reranking that pairs with the vector retriever strategy.
Tests: 9/9 in sift-retriever-scope.test.mjs updated and passing.
Full suite: 178 files / 1845 tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds operator/agent visibility into sift's indexing + retrieval stages.
The 30-min cold full-repo vector indexing test went silent for the full
budget because SF's wrappers never enabled sift's tracing layer; CPU and
disk activity were the only externally visible signals.
resolveSiftLogging(projectRoot) (code-intelligence.js:897) returns
{ env: { RUST_LOG: level }, logPath } honoring SF_SIFT_LOG_LEVEL
(default "info"; "off"/"none"/"" disables). Default destination:
${projectRoot}/.sf/runtime/sift/last-search.log, truncated per call so
it always reflects the most recent invocation.
Wired into three spawn sites:
- ensureSiftIndexWarmup (code-intelligence.js): detached child's stderr
fd opened with openSync(logPath, "a") and passed as stdio[2]
- runSift (tools/sift-search-tool.js): execFile env merges logEnv,
stderr appended to logPath in the execFile callback
- codebase_search execute (subagent/index.js): proc.stderr.on("data")
tees to logPath via fs.appendFileSync alongside the existing in-memory
buffer for tool output
When a sift result is empty or times out, the tool reply now includes
"(stage diagnostic: .sf/runtime/sift/last-search.log)" so the agent
sees immediately where to look.
Tests: 11 new in sift-logging.test.mjs — env resolution matrix, log-file
truncate/write contract, hint-string format on timeout/no-output/disabled.
Full suite 1857/1857, no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>