Commit graph

4537 commits

Author SHA1 Message Date
Mikael Hugo
fdc4650016 feat(self-feedback-drain): filter free opencode models from triage routing
Self-feedback triage routing was including paid opencode models even
when the operator policy prefers the free tier. Add
isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the
candidate list before the router scores them.

Also: cosmetic — quote style normalised by the formatter on
buildInlineFixPrompt strings and spawn options object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:24 +02:00
Mikael Hugo
3a14fe86a7 test(list-models): isolate from developer's discovery-cache
Tests were picking up the developer's real
~/.sf/agent/discovery-cache.json and seeing unexpected models in
output. Pin tests to a guaranteed-missing path via the new
_discoveryCacheFilePath option so the env they observe is solely
what the test constructs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:11 +02:00
Mikael Hugo
d8f56e6704 feat(cli): add sf key subcommand for auth.json management
Surgical read/write access to ~/.sf/agent/auth.json without touching
the file directly. All mutations go through AuthStorage so file-lock
and chmod-600 invariants are always respected.

  sf key set    <provider> <api-key>   add/rotate stored key
  sf key get    <provider>             show masked key (last 4 chars)
  sf key remove <provider> [--yes]     remove credential
  sf key list                          list all providers + status

Rationale: SF's source of truth for credentials is auth.json at
runtime — env vars are only used during initial one-time provider
setup. Rotation needs an explicit, audit-friendly path, not implicit
env-driven re-reads. Keys are never echoed in full (last 4 chars
only); remove always prompts unless --yes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:04 +02:00
Mikael Hugo
351bfad41d fix(memory): extractTranscriptFromActivity now reads custom_message entries
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"`
for assistant reasoning content. The old code only checked `role === "assistant"`,
so every transcript was empty → extraction silently skipped every unit.

Fix: recognise both legacy (`role === "assistant"`) and modern
(`custom_message` with `sf-*` prefix) entry shapes. Also reads the
standalone `text` field used by custom messages.

This is why memory_processed_units had 0 rows despite 34 activity logs.

Tests: 186 files / 1994 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:13:26 +02:00
Mikael Hugo
7ba469cff1 feat(memory): add debug logging to memory extraction pipeline
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.

Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry

This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.

Tests: 185 files / 1993 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:09:36 +02:00
Mikael Hugo
ba4b2d46d9 sf snapshot: uncommitted changes after 43m inactivity 2026-05-15 15:53:19 +02:00
Mikael Hugo
0b19afebf6 test(providers): expand discovery test matrix to 46 cases
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds full coverage for the discovery-gating root cause that was
fixed in commits d70d8d3b1 (xiaomi x-api-key auth) and the
subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry
work in model-catalog-cache.js.

Diagnosis recap: kimi-coding, opencode, opencode-go were silent
in ~/.sf/agent/discovery-cache.json because the SDK's
model-discovery.js adapter registry marked them with
StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's
discoverModels() never attempted them. SF's own
scheduleModelCatalogRefresh DID fetch them but wrote only to the
per-repo runtime cache (basePath/.sf/model-catalog/) and only fired
on session_start — not during --discover. The fix is to mirror the
write to the SDK's discovery cache on both fetch-path AND cache-hit
path, and await it in cli.ts before listModels when --discover is set.

New test sections:
- parseDiscoveredModels: OpenAI {data}/{models} formats, Google
  {models[].name} prefix stripping, name-as-id fallback, null on
  bad input, OpenRouter pricing extraction
- refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer),
  opencode uses Bearer, no-key providers skipped, SDK discovery cache
  written on BOTH network-fetch and cache-hit paths, kimi-coding +
  opencode-go iterated when keys present

46 tests pass. No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:09:38 +02:00
Mikael Hugo
67c088410c chore(discovery): silence debug stderr from refresh path
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Trailing instrumentation from the discovery investigation. The error
catch still swallows non-fatal failures during --discover, just no
longer prints to stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:03:56 +02:00
Mikael Hugo
fe28a48d81 fix(sift): revert to bm25,phrase for repo-root — hang was corrupted cache
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The earlier commit (44fcfb643) incorrectly disabled phrase on repo-root
because I thought phrase retriever hung on full-workspace scope. After
clearing the corrupted cache (left by killing a mid-build vector process),
testing confirms:

- bm25 alone on repo root: works, 1m 50s cold, instant warm
- phrase alone on repo root: works after cache clear
- bm25+phrase on repo root: works after cache clear
- vector on scoped paths: works after cache build

The "hang" was from a corrupted/stale cache, not a sift bug.
.siftignore is properly excluding files (146K→2,660 indexed).

Revert chooseSiftRetrievers back to bm25,phrase for repo-root.

Tests: 184 files / 1974 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 14:59:45 +02:00
Mikael Hugo
b88b66c651 feat(auto): fan out swarm research units 2026-05-15 14:54:27 +02:00
Mikael Hugo
c8854ca896 feat(discovery): cache stores pricing — unblocks zero-cost-but-not-:free models
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Today's discovery cache stored only model IDs (string[]). Downstream
isZeroCost(model?.cost) check evaluated against undefined for any
dynamically-discovered model, so OpenRouter's zero-cost-but-not-:free
entries (owl-alpha, lyria-3-pro-preview, lyria-3-clip-preview,
openrouter/free) got silently blocked by the built-in provider policy.

Cache entry shape now: {id, cost?, contextWindow?} per model.
parseDiscoveredModels extracts pricing from OpenRouter's
/api/v1/models response (pricing.prompt/completion/input_cache_read/
input_cache_write → numeric cost.{input,output,cacheRead,cacheWrite}).
Other providers stay {id}-only — their /v1/models endpoints don't
ship pricing.

Migration: on first read of a legacy string[] cache, entries are
converted in-place to {id} objects and the file is rewritten. No cost
backfill (data wasn't there before), but the new readers handle them.

Cost wired into policy: isModelAllowedByBuiltInProviderPolicy calls
lookupDiscoveredModelCost("openrouter", modelId) as a fallback when
the static model registry has no cost data.

Plus: cli.ts --discover now eagerly refreshes SF-managed providers
(opencode, opencode-go, kimi-coding, xiaomi) that the SDK's adapter
doesn't cover — so they populate cache on first --discover instead
of waiting for a session-start lazy refresh.

Tests: 13 new across 5 groups (pricing extraction, round-trip, legacy
migration, policy gate happy/sad paths, Google provider compat).
Full suite: 184 files / 1971 tests, zero regressions.

Real-world result: openrouter/owl-alpha, google/lyria-3-pro-preview,
google/lyria-3-clip-preview, openrouter/free, plus any future
zero-cost models now pass the policy filter on the next discovery
refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:51:00 +02:00
Mikael Hugo
d70d8d3b10 fix(providers): use x-api-key for xiaomi discovery 2026-05-15 14:43:09 +02:00
Mikael Hugo
09ea553b6d fix(auto): initialize notification store during bootstrap 2026-05-15 14:42:02 +02:00
Mikael Hugo
0a332f4cba fix(headless): normalize auto alias to autonomous 2026-05-15 14:32:00 +02:00
Mikael Hugo
44fcfb643c fix(sift): use bm25 only for repo-root — phrase retriever hangs on full scope
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Root cause: the sift binary's phrase retriever hangs indefinitely when
queried against the full repo-root scope (57K+ files). Earlier tests
mistook this for a general slowness, but isolated testing confirms:

- bm25 alone on repo root: works (1m 30s cold, instant warm)
- phrase alone on repo root: hangs forever
- bm25+phrase on repo root: hangs forever (phrase path blocks)
- all retrievers on scoped subdirs: work correctly

The earlier Rust panic was from a corrupted cache state left by killing
a mid-build vector process. After clearing the cache, bm25 alone works.

Fix: chooseSiftRetrievers now returns retrievers: "bm25" (not "bm25,phrase")
for repo-root scope. Scoped subdirs still get bm25+phrase+vector with
position-aware reranking.

Tests: updated 3 assertions in sift-retriever-scope.test.mjs.
Full suite: 183 files / 1958 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 14:28:23 +02:00
Mikael Hugo
1b5348e28e feat(providers): live discovery for opencode, opencode-go, minimax
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Three providers were missing from PROVIDER_CATALOG_CONFIG so their
model lists couldn't be auto-discovered. Their wire ids only existed
in packages/ai/src/models.generated.ts as hand-coded entries, meaning
new model variants from these providers required manual catalog edits.

Verified live endpoints respond to /v1/models with bearer auth:
- opencode      → https://opencode.ai/zen/v1/models      (6 free models)
- opencode-go   → https://opencode.ai/zen/go/v1/models   (15 models)
- minimax       → https://api.minimax.io/v1/models       (works)

Added entries:
  opencode:     baseUrl https://opencode.ai/zen, modelsPath /v1/models
  opencode-go:  baseUrl https://opencode.ai/zen/go, modelsPath /v1/models
  minimax:      baseUrl https://api.minimax.io, modelsPath /v1/models
                (international endpoint; Chinese-network api.minimaxi.com
                still handled separately in the SDK)

Auth keys already wired: OPENCODE_API_KEY, OPENCODE_GO_API_KEY (with
OPENCODE_API_KEY fallback), MINIMAX_API_KEY. No env-api-keys.ts changes.

Combined with 385e0b448 (dynamic canonicalIdFor resolver), new model
variants from these three providers will be auto-grouped in
.sf/model-performance.json without hand-editing CANONICAL_BY_ROUTE.

Live counts after fresh discovery will reveal experimental models
absent from static catalog (e.g. opencode's "big-pickle", opencode-go's
deepseek-v4-pro, mimo-v2.5-pro, hy3-preview). The model-router
tolerates unconventional wire IDs — no naming constraints.

To populate cache: rm -rf ~/.sf/runtime/model-catalog/ + relaunch sf.

Tests: 13 new in provider-catalog-discovery.test.mjs (catalog shape,
modelsPath presence, DISCOVERABLE_PROVIDER_IDS inclusion). Full suite
183 files / 1940 tests pass, zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:19:08 +02:00
Mikael Hugo
db3525b933 chore(model-registry): prune 15 redundant identity-strip aliases
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
After 385e0b448 added the dynamic discovery-cache resolver to
canonicalIdFor, the 15 identity-strip aliases added in 089bf0cbe for
discovered providers became pure redundancy — the dynamic path
returns the same bare modelId from the discovery cache.

Removed (all canonical == bare modelId, all providers in discovery cache):
- minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed
- mistral/codestral-latest, mistral/devstral-2512,
  mistral/devstral-small-2507, mistral/mistral-large-latest,
  mistral/mistral-medium-latest, mistral/mistral-small-latest
- zai/glm-4.5, zai/glm-4.5-air, zai/glm-4.6, zai/glm-4.7,
  zai/glm-5, zai/glm-5-turbo, zai/glm-5.1

Kept (real aliases — canonical differs from wire id, NOT identity strips):
- kimi-coding/kimi-for-coding → kimi-k2.6 (Moonshot alias)
- mistral/devstral-medium-2507 → devstral-medium-latest (alias to latest)
- minimax/MiniMax-M2 family lowercase mappings (case-change aliases)

Also kept:
- zai/glm-4.5-flash, zai/glm-4.7-flash (not yet in discovery cache;
  flash variants may launch before cache refresh — fast-path safety)
- kimi-coding/kimi-k2.6 + kimi-k2-thinking (kimi-coding cache only
  has kimi-for-coding; these resolve via _ENTRY_BY_ROUTE fallback)

Tests: 15 new regression tests in canonical-id-dynamic.test.mjs verify
each removed entry STILL resolves correctly via dynamic discovery.
Total 21/21 in that file, plus 101 model-registry tests, plus 16
canonical-id-mapping tests — all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:17:06 +02:00
Mikael Hugo
385e0b4480 feat(model-learner): canonicalIdFor consults discovery cache as fallback
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
After commit 089bf0cbe added 23 hand-written aliases for production
route keys, the right structural fix is to also consult the dynamic
model-discovery cache (~/.sf/agent/discovery-cache.json). Otherwise
every new model variant from a discovered provider (ollama-cloud +39
models, openrouter +24, etc.) requires another round of hand-editing.

canonicalIdFor now resolves in this order:
  1. CANONICAL_BY_ROUTE (static fast path, retains real aliases like
     kimi-coding/kimi-for-coding → kimi-k2.6 where canonical differs)
  2. _ENTRY_BY_ROUTE (existing static path)
  3. canonicalIdFromDiscovery — reads ~/.sf/agent/discovery-cache.json,
     finds (provider, modelId) pair, returns bare modelId

In-memory cache with 60s TTL (DISCOVERY_CACHE_TTL_MS) so the readFileSync
on the hot path becomes one disk read per minute at most. canonicalIdFor
is per-dispatch, not per-token, so the overhead is negligible.

Test hook __setDiscoveryCacheForTest lets vitest inject a cache without
touching the fs.

Tests: 6 new in canonical-id-dynamic.test.mjs (dynamic hit, static-alias
wins over dynamic, cache miss → null, null cache graceful, missing-models
graceful, multiple models per provider). Combined with existing
canonical-id-mapping: 22/22 pass. Full suite 1912 pass, no regressions.

Sanity verified: canonicalIdFor("ollama-cloud/glm-5.1") → "glm-5.1"
(dynamic-only, not in static table); canonicalIdFor("unknown/never")
→ null.

Follow-up (in flight, separate agent): prune the static identity-strip
aliases from CANONICAL_BY_ROUTE for providers in the discovery cache
since they're now redundant with the dynamic resolver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:14:04 +02:00
Mikael Hugo
2a58f4ebec feat(model-routing): autonomous fallback strict to enabledModels allowlist
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Autonomous mode's model-fallback chain bypassed enabledModels — when zai
429'd, the chain happily fell through to mistral/codestral-latest even
though only minimax/*, kimi-coding/*, zai/*, ollama-cloud/* were allowed.
Of 52 dispatches in this repo's journal this session, 10 (~19%)
escaped the allowlist (mistral×2, opencode-go×3, google-gemini-cli×5).

enabledModels was honored by interactive cycling (settings-manager.ts)
and by self-feedback-drain.js for triage routing, but
auto-model-selection.js's fallback chain in selectAndApplyModel never
read it.

Now: isModelInEnabledList(provider, modelId, enabledModels) filters
each fallback candidate. Supports exact "provider/model" or
"provider/*" wildcard. Empty/undefined list = open behavior (no
regression for setups without an allowlist).

readEnabledModels reads ~/.sf/agent/settings.json once per chain;
swallows IO errors → undefined → no constraint (safe failure mode).

Escape hatch: SF_BYPASS_ENABLED_MODELS=1 disables the check for
emergency / misconfigured cases.

When ALL candidates are filtered out and the chain exhausts, throws
a clear error directing the operator to add to allowlist or unset.

Tests: 13 in enabled-models-fallback.test.mjs covering pattern matrix,
multi-candidate chain skipping, bypass env, and exhaustion path.
Full suite 1906 pass, no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:02:58 +02:00
Mikael Hugo
089bf0cbeb fix(model-learner): resolve canonical-id lazy-load race + 23 wire-id aliases
Of 52 dispatches in this repo's journal this session, 51 landed in
.sf/model-performance.json's _unmapped bucket — meaning the live-outcome
learner couldn't tell which provider/model succeeded or failed. Only
1 dispatch (google-gemini-cli/gemini-3-flash-preview) bucketed correctly.

Root cause was NOT just missing aliases — it was a lazy-load race:
- model-learner.js declared canonicalIdFor as a fire-and-forget dynamic
  import side-effect at module bottom
- metrics.js called recordOutcome() synchronously after
  `await import("./model-learner.js")` resolved — before the registry
  injection promise settled
- Result: _canonicalIdForFn was null for the first dispatch every session.
  Every session. Since the file shipped.

Why nobody noticed: _unmapped is a bucket, not an error. No throw, no
warning, no UI surface. Selection still worked because benchmark-selector
+ static hand-tuned scores carry the routing decision. Only the
feedback loop (recordOutcome → adjust scores) was silently severed.

Fix:
- model-learner.js: export `registryReady` promise instead of swallowing it
- metrics.js: await registryReady before recordOutcome()
- model-registry.ts: 23 new CANONICAL_BY_ROUTE entries covering the actual
  production fallback chain — zai/glm-4.5{-air,-flash,5,5.1,5-turbo,4.6,4.7,4.7-flash},
  mistral/codestral-latest + devstral-2512 + devstral-{small,medium}-* +
  mistral-{large,medium,small}-latest, google-gemini-cli/gemini-{2.5-pro,3-flash-preview,3.1-pro-preview},
  opencode-go/{glm-5,glm-5.1,mimo-v2-omni,mimo-v2-pro}

Also adds opt-in backfillModelPerformanceFromJournal(basePath) to
reclassify the existing 51 _unmapped records from past journal events.
Never auto-runs; backs up the old file before overwriting.

Tests: 16 in canonical-id-mapping.test.mjs covering pattern matching,
non-mappable cases, bare canonical-id passthrough, and the backfill
path. Full suite 1906 pass, no regressions.

Known follow-up: CANONICAL_BY_ROUTE uses mixed casing (MiniMax-M2.7 vs
minimax-m2) — should be standardized lowercase in a future pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:02:58 +02:00
Mikael Hugo
5f92320c7d fix(auto): timeout silent swarm turns despite heartbeats 2026-05-15 13:55:04 +02:00
Mikael Hugo
85f6650852 fix(auto): keep solver checkpoint pass out of swarm 2026-05-15 13:35:20 +02:00
Mikael Hugo
bd3fbda9cb feat(journal): swarm-dispatch event per dispatch — cross-repo telemetry
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The swarm dispatch path is default in headless (ea8a3d935) but the
journal didn't tag events with which dispatch path was used. Result:
grep "swarm" .sf/journal/*.jsonl returned zero hits across this repo,
~/code/dr-repo, ~/code/centralcloud/dr — even where swarm IS running.
Cross-repo telemetry was blind to swarm adoption.

Now both swarm dispatch sites emit a journal event per call:

runUnitViaSwarm (auto/run-unit.js):
- success: outcome from worker checkpoint or "continue", via "autonomous-unit"
- no-reply: outcome "no-reply" with error field
- throw:   outcome "error" with error field

runSingleAgentViaSwarm (subagent/index.js):
- success: outcome "agent-reply", via "subagent-extension", agentName
- no-reply / catch: same outcome scheme as run-unit

Event shape:
{
  ts, eventType: "swarm-dispatch",
  data: { unitType, unitId, targetAgent, workMode, toolCallCount,
          outcome, via, agentName?, error? }
}

All six emitJournalEvent calls wrapped in try/catch — journal write
failure must not break dispatch (mirrors crash-recovery.js pattern).

Tests: 68 new assertions across the two files (5 + 4 test groups
covering happy path, no-reply, throw). Full suite 1872 pass, no
regressions.

Once landed everywhere this enables:
- grep swarm-dispatch .sf/journal/*.jsonl shows adoption
- ~/.sf/agent/upstream-feedback.jsonl rolls up swarm vs legacy ratio
- "is this repo using swarms?" becomes a one-line query

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 13:22:28 +02:00
Mikael Hugo
c42c13b882 feat(auto): trigger sift index warmup at start of every autonomous loop
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Previously, sift warmup only ran during sf init/auto-start, which meant
repos launched via sf headless or entered mid-session never got their
index built. The first sift_search/codebase_search call would then block
for minutes while the cold cache was built.

Now autoLoop() calls ensureSiftIndexWarmup() at loop entry. The warmup
runs detached (background process) and is skipped if already running or
if a recent marker exists. This ensures every repo SF operates on gets
indexed regardless of entry path.

- Best-effort: wrapped in try/catch so warmup failures never block the loop
- Lazy import to avoid circular dependencies
- Debug-logged for observability

Tests: 179 files / 1863 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 13:17:44 +02:00
Mikael Hugo
8b4123cccc fix(self-feedback): JSONL header is JSON-valid meta marker, not # comment
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Phase 2 (216b1d43f) wrote "# generated from .sf/sf.db ..." as line 1 of
.sf/self-feedback.jsonl. readJsonl tolerated it via try/catch around
JSON.parse, but the doctor's stricter JSONL syntax check flagged it as
"invalid jsonl syntax: line 1: Unexpected token '#'".

Replace the # comment with a JSON-valid meta marker:
  {"_meta":"generated from .sf/sf.db","_warning":"do not edit directly; use the resolve_issue tool or sf headless triage --apply"}

readJsonl now skips entries carrying `_meta` so downstream consumers
don't see the marker as a self-feedback record. Tests updated to match
the new marker shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:39:16 +02:00
Mikael Hugo
216b1d43f1 feat(self-feedback): DB-first migration — JSONL + Markdown as render targets
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Phase 2 of the DB-first planning state migration (proposal f3571475d,
Phase 1 ec65b4d88 covered VALIDATION.md). Same approach for self-feedback:
DB is canonical; .sf/self-feedback.jsonl and .sf/SELF-FEEDBACK.md are
projections regenerated from DB.

Solves a real pain: 4 self-feedback entries were stuck visible in
sf headless triage --list because the resolution path (markResolved)
read JSONL while the entries lived only in DB after autonomous wrote
them through the structured ledger. Hand-edit fixes were obsolete-bound
under the divergent-stores design.

markResolved (self-feedback.js:870-940): success branch now calls
regenerateSelfFeedbackJsonl + regenerateSelfFeedbackMarkdown after the
DB write (resolveSelfFeedbackEntry), replacing the
appendResolutionToJsonl + regenerate-markdown sequence. Legacy in-place
JSONL rewrite path retained only for !isForgeRepo (upstream log).

New helpers:
- regenerateSelfFeedbackJsonl(basePath): writes JSONL from DB via
  listSelfFeedbackEntries(); first line is "# generated from .sf/sf.db
  — do not edit directly; use the resolve_issue tool" (readJsonl
  already tolerates non-JSON lines via try/catch in JSON.parse, no
  parser change needed)
- backfillSelfFeedbackJsonl(basePath): calls importLegacyJsonlToDb
  then regenerateSelfFeedbackJsonl; idempotent and exact-byte stable
  on repeated calls

Bootstrap (register-hooks.js): backfillSelfFeedbackJsonl runs on every
session start before compactSelfFeedbackMarkdown. No-op when DB
unavailable.

DB schema unchanged: acceptanceCriteria lives in full_json column and
is surfaced via rowToSelfFeedback's ...parsed spread; markResolved's
AC-file-touch verification works without change.

Tests: 6 new in self-feedback-db.test.mjs (DB-only entry resolves
without JSONL, both projections reflect resolution, backfill idempotent
+ byte-stable, generated-header present, 4 flagged entries resolve
cleanly via the new path). 28 tests in the file pass; full suite
179 files / 1863 tests pass, no regressions.

Live verification: backfillSelfFeedbackJsonl ran against production
.sf/sf.db; all 50 DB entries now in JSONL including the 4 previously
stuck entries — resolve_issue calls for them now succeed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 12:29:39 +02:00
Mikael Hugo
7c78994612 fix(auto): pause on out-of-scope task changes 2026-05-15 12:17:20 +02:00
Mikael Hugo
32362a83bc feat(sift): add --verbose flag and vector-index progress logging
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds three improvements to sift diagnostics:

1. --verbose flag: When SF_SIFT_LOG_LEVEL=debug|trace, sift search
   calls now include --verbose for richer stderr output from the Rust
   binary. Applied to sift_search, codebase_search, and warmup paths.

2. Vector-index progress poller: During searches that include the
   'vector' retriever, a 30-second interval polls the global sift cache
   (~/.cache/sift/search/artifacts/indexes/*/sectors/) and writes
   progress lines to the log file:
     [2026-05-15T11:00:00Z] vector-index progress: 32 sectors (80 MB total)
   This lets an operator tail the log during long cold-cache embedding
   builds instead of staring at a silent process.

3. estimateVectorIndexProgress / countVectorSectors helpers count sector
   files across all index directories and report total count + size.

Tests: 179 files / 1858 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 11:23:54 +02:00
Mikael Hugo
9b42404149 fix(sift): change reranking from invalid 'rerank' to 'position-aware'
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
chooseSiftRetrievers returned reranking: 'rerank' which is not a valid
sift CLI value. Valid values are: none, position-aware, llm, jina, gemma.
This caused vector searches to fail with 'invalid value for --reranking'.

Fix: use 'position-aware' for scoped subdir searches. This is the
structural reranking that pairs with the vector retriever strategy.

Tests: 9/9 in sift-retriever-scope.test.mjs updated and passing.
Full suite: 178 files / 1845 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 11:06:33 +02:00
Mikael Hugo
5e478d6506 fix(auto): avoid duplicate swarm checkpoints 2026-05-15 11:01:08 +02:00
Mikael Hugo
7a4a62e244 fix(auto): cap checkpoint repairs before retries 2026-05-15 10:58:02 +02:00
Mikael Hugo
604ebbf824 feat(sift): structured stderr logging — last-search.log + RUST_LOG=info
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds operator/agent visibility into sift's indexing + retrieval stages.
The 30-min cold full-repo vector indexing test went silent for the full
budget because SF's wrappers never enabled sift's tracing layer; CPU and
disk activity were the only externally visible signals.

resolveSiftLogging(projectRoot) (code-intelligence.js:897) returns
{ env: { RUST_LOG: level }, logPath } honoring SF_SIFT_LOG_LEVEL
(default "info"; "off"/"none"/"" disables). Default destination:
${projectRoot}/.sf/runtime/sift/last-search.log, truncated per call so
it always reflects the most recent invocation.

Wired into three spawn sites:
- ensureSiftIndexWarmup (code-intelligence.js): detached child's stderr
  fd opened with openSync(logPath, "a") and passed as stdio[2]
- runSift (tools/sift-search-tool.js): execFile env merges logEnv,
  stderr appended to logPath in the execFile callback
- codebase_search execute (subagent/index.js): proc.stderr.on("data")
  tees to logPath via fs.appendFileSync alongside the existing in-memory
  buffer for tool output

When a sift result is empty or times out, the tool reply now includes
"(stage diagnostic: .sf/runtime/sift/last-search.log)" so the agent
sees immediately where to look.

Tests: 11 new in sift-logging.test.mjs — env resolution matrix, log-file
truncate/write contract, hint-string format on timeout/no-output/disabled.
Full suite 1857/1857, no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 10:56:32 +02:00
Mikael Hugo
091168303c fix(auto): abort swarm checkpoint loops 2026-05-15 10:55:37 +02:00
Mikael Hugo
22760e03d5 fix(sift): increase timeouts for vector retriever + scope-aware retriever for codebase_search
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Vector retriever was disabled everywhere because it appeared to hang.
It was actually doing a first-time embedding index build for 57K files,
which takes ~60-90 min. Re-enable vector by increasing timeouts and
letting scope-aware retriever selection decide when vector is safe.

Changes:
- sift_search: retriever timeout 30s->300s, total 60s->600s
- codebase_search: total timeout 120s->600s
- warmup: retriever timeout 30s->300s, hard timeout 600s->3600s
- codebase_search now uses chooseSiftRetrievers() instead of hardcoded
  bm25+phrase: repo-root -> bm25+phrase (fast), scoped subdirs -> vector
- Comments updated to reflect "slow first build" not "hang"

Tests: 178 files / 1845 tests, all pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 10:46:35 +02:00
Mikael Hugo
427324fb93 fix(plan): update existing milestone specs without stale params 2026-05-15 10:45:18 +02:00
Mikael Hugo
6e40b829f2 feat(sift): scope-aware retriever selection — vector for scoped, bm25 for repo-root
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Commit 1a98d8f9a hardcoded --retrievers bm25,phrase across all sift
calls to work around the full-repo vector inference hang. But vector
retrieval works fine on scoped subdirectory queries (empirically: ~30s
on src/resources/extensions/sf/uok with real semantic scoring). The
hang is the full-repo indexing scope, not the inference path.

This commit replaces the universal bm25 restriction with a
scope-aware selector chooseSiftRetrievers(scopePath, projectRoot):
- scopePath resolves to repo root → bm25+phrase, no rerank (safe)
- scopePath resolves to anything else → bm25+phrase+vector, rerank
  enabled (semantic ranking unlocked)

ensureSiftIndexWarmup behavior unchanged (scope is "." → repo-root →
bm25+phrase). buildSiftArgs in the codebase_search tool now defaults
to vector when the caller passes a scoped path; explicit retrievers
overrides still win.

Unlocks the high-leverage uses described earlier this session
(memory ranking, plan/research context pre-fetch) for free — those
always scope to a sub-tree.

Tests: 9 new in sift-retriever-scope.test.mjs cover the dispatch
matrix (repo-root variants get bm25, subdir variants get vector,
explicit override wins, regression guard for warmup default).
Full suite: 178 files / 1844 tests, no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 10:25:22 +02:00
Mikael Hugo
d90ac1fd69 fix(codebase_search): disable vector retriever to prevent hang
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The vector retriever in sift hangs indefinitely during embedding model
inference, causing all codebase_search calls to timeout. Apply the same
fix as sift_search: restrict retrievers to bm25+phrase and disable ML
reranking.

- buildCodebaseSearchArgs: add --retrievers bm25,phrase --reranking none
- Update tool description from (BM25 + Vector) to (BM25 + phrase)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 10:13:31 +02:00
Mikael Hugo
1a98d8f9af fix(sift): disable vector retriever + ML reranking to prevent hang
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The sentence-transformers/all-MiniLM-L6-v2 embedding model inference hangs
indefinitely during sift search, causing:
- Warmup to never complete (TTL expired 62+ min ago)
- All page-index-hybrid searches to timeout
- The search cache to become stale

Fix: Restrict warmup and search to bm25+phrase retrievers with no ML
reranking. This gives fast lexical results while avoiding the hanging
embedding inference path.

Also expose --retrievers and --reranking params in sift_search tool so
callers can override per-query if needed.

Closes #vector-hang-fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 09:45:49 +02:00
Mikael Hugo
ec65b4d881 feat(planning-state): DB-first VALIDATION.md migration (proposal MVP)
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Implements Phase 1 of docs/dev/proposals/db-first-planning-state.md
(commit f3571475d). VALIDATION.md is now a render target; DB is
canonical.

Three read sites switched to DB:
- tools/complete-milestone.js: getMilestoneValidationAssessment(id)?.status
  replaces readFile + extractVerdict (lines 126-137 → 126-140)
- workspace-index.js: same swap in the indexWorkspace loop (was
  resolveMilestoneFile → loadFile → extractVerdict per milestone)
- state-shared.js:readMilestoneValidationVerdict was already DB-first
  (prefers DB, file fallback only when no DB) — no change needed

Write path regenerates:
- tools/validate-milestone.js:renderValidationMarkdown now prepends
  <!-- generated from .sf/sf.db — do not edit directly; use the
  validate_milestone tool --> so the file is unambiguously a projection
- verdict-parser.js:extractVerdict strips the comment header before
  frontmatter parsing so legacy readers (reflection.js, auto-prompts.js)
  still work on generated files

Doctor check retired (clean delete):
- doctor-engine-checks.js: db_projection_validation_drift detector
  removed entirely. Drift is structurally impossible once the write
  path always regenerates from DB. Comment block explains the removal.

Tests:
- New: db-first-validation.test.mjs — 6 tests covering regeneration,
  three read-site overrides, hand-edit override, doctor non-emission
- Updated: doctor-db-projection-drift.test.mjs now asserts the check is
  NOT emitted (was previously asserting it WAS)

Full suite: 469 passed, 0 failed, 3 skipped. No regressions.

Closes the same class as the self-feedback DB/JSONL divergence pain —
the M001-6377a4-VALIDATION.md doctor warning that's been firing
repeatedly this session is gone by construction. Other planning
artifacts (CONTEXT.md, ROADMAP.md, SUMMARY.md) follow in later phases
per the proposal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:35:28 +02:00
Mikael Hugo
7dbf8ad430 feat(model-policy): wire lineage-diverse-from-worker into selector
Round 8's e7cf16882 declared the adversary role and the
lineage-diverse-from-worker constraint but left actual filtering as
a TODO in selectAndApplyModel. This wires the filter end-to-end.

selectAndApplyModel now accepts (role, workerModelId) trailing params:
- role: from modelRoleForUnitType(unitType) (extended to recognize
  "adversary"/"challenge"/"red-team" unit types as the adversary role)
- workerModelId: explicit caller-supplied override, else falls back to
  _lastWorkerModelId (process-local cache populated whenever a worker-
  role dispatch resolves a model)

When role is adversary or reviewer AND the role-policy includes
lineage-diverse-from-worker, applyLineageDiverseFilter strips
candidates that share root vendor with the worker model (via
isSameRootVendor from model-role-policy.js). If filtering would leave
zero candidates, a warning is logged and the unfiltered set is used
(better a same-vendor reviewer than no reviewer).

phases-unit.js threads modelRoleForUnitType(unitType) into
selectAndApplyModel — the only producer site that needed the role
parameter.

Tests: 13 new (7 pure unit on applyLineageDiverseFilter — vendor
mapping matrix + edge cases; 6 integration on selectAndApplyModel +
modelRoleForUnitType wiring). All 37 tests in the affected files pass,
no regressions.

Concern: if the per-unit model config (from disk prefs) maps exclusively
to the worker's vendor and has no fallback candidates, returns
appliedModel: null — operator-configurable. Documented in tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:24:50 +02:00
Mikael Hugo
f3454de58a fix(triage): --run routes through runTriageApply{dryRun:true} via SF router
Closes sf-mp5khix3-9beona architecture-defect:triage-run-bypasses-sf-routing.

The legacy `runTriage` in self-feedback-drain.js hardcoded
DEFAULT_TRIAGE_MODEL="google-gemini-cli/gemini-3-pro-preview" and
dispatched via @singularity-forge/ai completeSimple (text-only, no
tools). The result: an autonomous triage path that produced a markdown
decision matrix operators had to manually apply via resolve_issue.

Now `--run` goes through runTriageApply with a new `dryRun: true`
option that:
- uses the same Phase 1/2 pipeline as --apply (triage-decider + review)
- pre-resolves the model via SF's router (rankTriageModelsViaRouter),
  no hardcoded model
- skips Phase 3 applyTriagePlan (read-only by design)
- uses permissionProfile="low" and relaxes the trusted-source +
  custom-runner guards for the inspection path
- prefixes flowId with "triage-run-" for clean trace separation

Legacy runTriage kept as @deprecated (still exercised by
self-feedback-drain.test.mjs unit tests that target completeSimple
dispatch directly).

Tests: 6 new in headless-triage-run-routing.test.ts covering dryRun
short-circuit, no ledger mutations, guard relaxation, router not
hardcoded, disagreement surfaces deciderOutput. Full triage suite:
35 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:20:43 +02:00
Mikael Hugo
a5dd5db354 fix(self-feedback): align report kinds and isolate watchdog tests 2026-05-15 09:19:27 +02:00
Mikael Hugo
ff31258629 chore: capture autonomous in-flight self-improvements
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Snapshot uncommitted work autonomous made in this session:
- run-unit.js +54: enrich runUnitViaSwarm with completedItems /
  remainingItems / verificationEvidence pass-through from worker
  checkpoint args
- self-feedback.js +10
- 2 test files updated to match the new shape

All 72 affected tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:03:42 +02:00
Mikael Hugo
d57cd84d9a fix(auto): make halt watchdog observable 2026-05-15 08:09:02 +02:00
Mikael Hugo
f9c147a08b fix(swarm): ignore heartbeats for silent worker timeout 2026-05-15 08:00:35 +02:00
Mikael Hugo
e464a1bd6e fix(swarm): bound silent worker responses 2026-05-15 07:35:31 +02:00
Mikael Hugo
81425230f5 fix(headless): do not restart graceful child exits 2026-05-15 07:25:06 +02:00
Mikael Hugo
9ba9b55f7a fix(uok): import memory extractor from closeout 2026-05-15 07:12:10 +02:00
Mikael Hugo
c5850c8039 fix(verify): ignore stale broad cargo preferences 2026-05-15 07:06:17 +02:00
Mikael Hugo
d1ca3d035c fix(auto): count only unproductive runaway iterations 2026-05-15 06:55:05 +02:00