Commit graph

4565 commits

Author SHA1 Message Date
Mikael Hugo
725affd126 feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71)
SF is a purpose-to-software compiler — every self_feedback row must name
the milestone vision or slice goal it's filed against, so triage can
prioritize against purpose rather than treating each row as floating.

  - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT.
    NULL allowed for legacy rows; fresh-DB CREATE includes the column.
  - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor
    (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose})
    pushes a LIKE %fragment% filter into the DB layer so triage doesn't
    have to pull the full table.
  - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON
    projection for legacy rows where the column is NULL.
  - headless-feedback CLI: `feedback add --purpose <fragment>` persists
    the anchor; `feedback list --purpose <fragment>` filters by it.
    Omission stays valid — restoration is additive, not breaking.
  - help-text + migration test updated; new vitest covers add/list
    round-trip, NULL-on-omit legacy compat, substring match, and the
    help-text documentation contract.

Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md:
"non-trivial artifacts must name their purpose and consumer."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:51:52 +02:00
Mikael Hugo
3c416162b2 feat(sf-db): record per-task purpose_trace at complete_task (ADR-0000)
Restore the purpose-to-software doctrine at the slice gate: every task
the executor closes must name the slice-goal sentence or clause it
served. complete-slice now refuses to flip a slice to complete while
any of its tasks has a NULL purpose_trace, making "did all tasks
actually serve the slice goal" a mechanical check instead of a vibe.

Schema migration v70 adds a nullable purpose_trace TEXT to tasks
(legacy rows stay valid). complete_task refuses without it and quotes
slice.goal in the error so the agent can anchor. insertTask /
updateTaskStatus accept the new field, rowToTask exposes it, and a
new updateTaskPurposeTrace helper covers later corrections.

Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:50:27 +02:00
Mikael Hugo
fa657f2523 feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2)
Restoration of doctrine: plan-milestone now emits a literal milestone.vision
clause per slice (traces_vision_fragment) so validate-milestone has structured
grounds for assessment instead of re-reading the vision through the LLM every
time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and
plan_milestone tool start requiring it for new slices, rejecting fragments that
do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:48:50 +02:00
Mikael Hugo
3b83f0898b merge(P4): purpose-coherence-gate before every dispatch (ADR-0000) 2026-05-15 18:45:17 +02:00
Mikael Hugo
5e2c7a7166 merge(P1): vision quality gate on sf new-milestone (ADR-0000) 2026-05-15 18:45:08 +02:00
Mikael Hugo
59fbaf4b0f test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers
Cleans up doctor-purpose-gate.test.mjs:
- Uses insertMilestone/insertSlice helpers instead of raw SQL
- Removes redundant test from doctor-plan-dir-normalization.test.mjs
- Adds module-level JSDoc purpose comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:44:09 +02:00
Mikael Hugo
a303b5db29 feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000)
Restores the eight-PDD purpose gate at the autonomous-loop boundary
required by ADR-0000 (SF is a purpose-to-software compiler). The gate
walks milestone vision -> slice.traces_vision_fragment ->
task.purpose_trace before every dispatch and refuses to proceed when
the purpose chain is broken at the vision root (degraded-vision).

- New uok/purpose-coherence.js with a pure verdict function and a
  DB-backed adapter. Reads vision/trace columns directly via SQL so
  pre-P2/P3 schema migrations are tolerated.
- Wired into auto/phases-pre-dispatch.js alongside resource-version-
  guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on
  every pre-dispatch turn and emits to the existing trace JSONL.
- Outcome ladder: fail (vision missing -> pause loop), warn (trace
  columns missing or NULL -> surface but allow dispatch so legacy DBs
  don't hard-break on day one), pass (full chain).
- Tests in tests/uok-purpose-coherence.test.mjs cover the four
  contracted states plus the column-missing downgrade path on a
  pre-migration schema.

Refs: ADR-0000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:42:55 +02:00
Mikael Hugo
fb68b12902 feat(doctor): enforce ADR-0000 purpose gate — milestones need vision, slices need goal
Adds two new doctor checks to checkEngineHealth():

- db_milestone_missing_vision: error when a milestone has no vision
  (the WHY/purpose field per ADR-0000)
- db_slice_missing_goal: error when a slice has no goal
  (the WHAT/purpose field per ADR-0000)

Both checks are non-fixable (the operator must define purpose).
This aligns with ADR-0000 §Enforcement: "Non-trivial milestones,
slices, tasks, ADRs, specs, tests, and exported symbols must name
their purpose and consumer."

Tests: 2 cases — milestone without vision flagged, slice without
goal flagged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:42:14 +02:00
Mikael Hugo
aa0d57371e feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone
Restoration of forgotten doctrine: ADR-0000 declares the eight PDD
fields (Purpose, Consumer, Contract, Failure boundary, Evidence,
Non-goals, Invariants, Assumptions) the purpose gate, but
`sf headless new-milestone --context <file>` was accepting any
context including empty or trivially-thin seed docs. This wires a
pre-create check that refuses the run when fields are missing or
too thin, naming exactly which ones so the operator can fix the
seed doc and retry.

- new src/resources/extensions/sf/headless-pdd-check.js: scans
  context for the eight fields (heading and inline-label forms) and
  reports missing/sparse, plus a minimum-spine check (Purpose +
  Consumer + Contract + Evidence-or-Falsifier).
- src/headless.ts calls the check after loadContext, before
  bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text.
- --skip-pdd-check is the migration escape hatch (warning printed,
  PDD gate bypassed) for milestones that pre-date the gate.
- SF-internal auto-bootstrap (autonomous→new-milestone fallback)
  is exempted because the seed is SF-generated, not operator-PDD.
- vitest test covers missing-Purpose, missing-Consumer, all-8,
  sparse, inline-label form, Falsifier-as-Evidence spine, and the
  doctrine field order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:37:06 +02:00
Mikael Hugo
af1401e4ea fix(solver): enforce PDD purpose gate 2026-05-15 18:32:59 +02:00
Mikael Hugo
bb0c87fdac feat(remediation-dispatcher): M003 S04 — autonomous recovery from validation findings
Implements RemediationDispatcher that classifies verification failures
and maps them to recovery strategies:

- transient    → retry (timeout, flaky test, network)
- structural   → replan (broken import, syntax error)
- knowledge    → research (not implemented, missing context)
- infra        → escalate via self-feedback (tooling broken)

Confidence scoring:
- Single failing check + known pattern = high confidence
- Multiple failures or high retry count = lower confidence
- Configurable autoFixThreshold (default 0.6)

15 unit tests covering all 4 failure classes + confidence scoring +
threshold behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:29:45 +02:00
Mikael Hugo
a863672463 fix(state): honor completed owning requirements 2026-05-15 18:24:16 +02:00
Mikael Hugo
b08cb13c20 fix(state): requirements-complete short-circuits the planning ladder
Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05,
PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but
the milestone row was still active because its only slice was a
post-migration skipped placeholder. After the previous fix routed
all-skipped milestones to pre-planning, SF ran roadmap-meeting +
plan-milestone and wrote 3 new slices on a milestone whose
contract-level work was already done — burned ~4 LLM turns on
plausibly-adjacent but unwanted re-decomposition.

Root cause: deriveStateFromDb's milestone-completion gate consults
only slice statuses (and indirectly the milestone row's own status
field). It never reads REQUIREMENTS.md to check whether the
contract is already satisfied. The slice-based view collapsed the
real signal.

Fix:

- New parseRequirementsByMilestone(content) helper in files.js:
  parses REQUIREMENTS.md, groups entries by their `Primary owning
  milestone` field, returns Map<id, {complete, incomplete}>.

- handleAllSlicesDone now reads REQUIREMENTS.md before its
  slice-based real-work check. If a milestone has at least one
  owning requirement and zero of them are incomplete, route to
  completing-milestone with nextAction naming the requirement count
  (so the operator can see *why* the milestone is being closed
  without manually opening REQUIREMENTS.md).

- Best-effort: REQUIREMENTS.md parse failure falls through to the
  existing slice-based rule. Missing file likewise — no regression
  for projects that don't keep a requirements file.

Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI.
End-to-end verified by re-running sf headless query on dr-repo
M003: now reports phase=completing-milestone with the right
requirement-count message.

Tests: 5 new cases — all complete + slice skipped → completing,
some active → pre-planning, zero owning requirements falls through,
missing file falls through, all complete + real slice work still
completes. Existing 4 all-skipped-replan cases still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:22:28 +02:00
Mikael Hugo
6a2c61d5ee feat(halt-self-feedback): M003 S03 — HaltWatchdog self-feedback integration
T01: Added integration test auto-halt-self-feedback.test.mjs that proves:
  - HaltWatchdog.check() creates a self-feedback DB entry with
    kind=runaway-loop:idle-halt, severity=high, blocking=true
  - Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated
  - Deduplication works (one entry per idle period)
  - New heartbeat resets and creates a new entry for the next idle period

T02: Enhanced evidence string to include elapsedMs, iteration, and
thresholdMs explicitly (R003 actionable context requirement).

Tests: 36/36 pass across auto-halt-self-feedback,
auto-halt-watchdog-notify, and self-feedback-db suites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:19:39 +02:00
Mikael Hugo
5cd5e14160 feat(headless): surface memory auth pause 2026-05-15 18:16:08 +02:00
Mikael Hugo
90b8e7edf8 feat(headless): expose memory extraction diagnostics 2026-05-15 18:13:35 +02:00
Mikael Hugo
f00762ffdb fix(feedback): allow restamping suspect resolutions 2026-05-15 18:10:14 +02:00
Mikael Hugo
820f9aaf8e fix(memory): classify extraction failures 2026-05-15 18:06:33 +02:00
Mikael Hugo
7dfb5099c9 fix(state): all-skipped milestone routes to pre-planning, not validating
handleAllSlicesDone treated isStatusDone uniformly — "complete",
"done", AND "skipped" all counted as "milestone work is finished",
so a milestone whose only slice was skipped would advance to
phase=validating-milestone. That's wrong: a placeholder slice that
was skipped doesn't validate the milestone's success criteria, it
just clears the wedge.

Surfaced concretely in dr-repo M003 (Unified Dashboard + Pilot
Validation): I skipped the migration placeholder via the new
`sf headless skip-slice` CLI, and the next-dispatch reported
`validate-milestone M003` even though no real work had happened on
the milestone. The autonomous loop would then burn an LLM turn
running validate-milestone just to discover the obvious gap.

Fix: differentiate {complete, done} from {skipped} at the gate.
When zero slices carry real-work outcomes, route into the
pre-planning phase so the dispatcher's existing
discuss → research → plan ladder takes over. The PDD/vision is
already in the milestone row, so the planner has the purpose it
needs without operator hand-holding.

Verified end-to-end against dr-repo: `sf headless query` for M003
now reports phase=pre-planning and next dispatch
`roadmap-meeting M003` (the deep-planning entry rule fires first;
discuss/research/plan come after as artifacts land).

Tests: 4 cases — all-skipped → pre-planning, complete+skipped mix
→ validating, legacy "done" alias → validating, multiple skipped
→ pre-planning.

Resolves sf-mp73sk0m-63w88y (filed via headless feedback CLI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:00:21 +02:00
Mikael Hugo
881fd5e304 feat(memory,state): runtime counters for memory injection + milestone work validation
Memory injection telemetry:
- Move counter writes from auto-prompts.js to memory-store.js (where
  getRelevantMemoriesRanked/getActiveMemoriesRanked actually fire).
- Track memory_inject_count and memory_inject_chars_total via
  runtime_counters table for headless-query reporting.

State-db validation:
- handleAllSlicesDone now checks if any slice carries real work
  (status=complete/done) before routing to validation.
- Milestones with all-skipped slices route to "reassess-roadmap"
  instead of asking the operator to validate non-existent work.

SM client defense:
- Filter foreign-tenant memories from SM query responses even when
  the server returns them (defense-in-depth).

Tests updated: memory-extraction-lifecycle, sf-db-migration,
headless-query-memory-injection, sm-client, memory-tenant-gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 17:57:45 +02:00
Mikael Hugo
9abbfaada2 feat(memory-tenant-gate): add project-scoped isolation for SM cross-project recall
Closes sf-mp723nju-2cpeoc. When SM_ENABLED is on, memory retrieval from
Singularity Memory is now scoped to the current project's repoIdentity
tenant. Foreign-tenant memories are filtered client-side and the tenant
filter is sent server-side for SM servers that support it.

Key changes:
- schema v68: ADD COLUMN tenant TEXT on memories table (NULL = legacy)
- insertMemoryRow: persists tenant field on every new record
- backfillMemoryTenants / backfillMemoryTenantRows: idempotent migration
  called on session_start when SM_ENABLED is set
- querySmMemories: resolves effectiveTenantId (opts.tenant > opts.tenantId
  > SM_TENANT_ID); returns [] when no tenant resolved and crossTenant off
- SM_CROSS_TENANT_ENABLED=1 opt-in bypass with audit warning in console
- register-hooks session_start: calls backfillMemoryTenants when SM active
- 12 new tests in memory-tenant-gate.test.mjs; updated sm-client.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:55:26 +02:00
Mikael Hugo
ff333ae067 feat(memory): surface injection token cost in headless query
The Project Memories section is rendered into every execute-task,
plan-slice, and research-slice prompt. At 10 memories × ~200 chars
each that's ~2K chars/turn injected into the context — real cost,
no operator-visible meter.

Adds two runtime_counters (already-existing key/value store):

  memory_inject_chars_total  — cumulative section size
  memory_inject_count        — number of injections

Written by buildProjectMemoriesSection() on every render. Both
writes sit inside a try/catch so a legacy DB without
runtime_counters silently skips rather than blocking prompt build.

`sf headless query` surfaces the cumulative + derived metrics as a
new top-level `memoryInjection` block:

  {
    total_chars: 12480,
    count: 8,
    avg_chars: 1560,
    estimated_total_tokens: 3120
  }

The block is omitted entirely when count is 0 (fresh project / no
prompts rendered yet) so it doesn't clutter the snapshot.

Operators can now correlate prompt size growth against autonomous
run cost without instrumenting the LLM call sites directly. The
estimated_total_tokens is chars/4 — a rough approximation since SF
doesn't tokenise the section, intentionally documented as such.

Resolves sf-mp723yl9-rcxoeh filed via the headless feedback CLI.

Tests: 5 source-level invariants — type carries the section, query
reads counters by name, snapshot omits section on zero, write side
calls both counter functions, write is wrapped in try/catch with
documented failure-mode comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:55:14 +02:00
Mikael Hugo
671b2c8628 feat(sm-client): defense-in-depth tenant filter on SM responses
Even though querySmMemories pins tenantId in the request body sent
to the Singularity Memory server, SF used to accept whatever came
back without verifying. A misconfigured or compromised SM server
could echo memories from other tenants and SF would inject them
into the next execute-task prompt — cross-customer leak.

filterSmMemoriesToTenant() now re-checks every returned memory:

  - same-tenant memories pass through
  - foreign-tenant memories (memory.tenantId OR memory.tenant !=
    expectedTenantId) are dropped, with a one-line warning so the
    misconfigured-SM symptom is visible rather than silent
  - memories with no tenant claim at all default to allow — matches
    the local DB's "NULL tenant = legacy row" rule from schema v68
  - SM_REQUIRE_TENANT_CLAIM=true flips the legacy rule to drop
    (hard fail-closed mode for operators who want it)

Defensive guards against non-array inputs, missing expectedTenantId
(returns input unchanged so caller-side fail-open semantics are
preserved), and the dual tenantId/tenant field naming.

Tests: 8 cases — same-tenant pass-through, foreign drop, legacy
allow, strict mode drop, tenantId/tenant alias, empty/non-array
defensiveness, missing-expected pass-through, warning emission.

Resolves the cross-project tenant-leak feedback row filed via the
new headless feedback CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:49:24 +02:00
Mikael Hugo
7c3d9bd3bf fix(memory): gate SM recall by tenant scope 2026-05-15 17:46:51 +02:00
Mikael Hugo
0c7aaafa00 feat(memory): enrich execute-task memory retrieval query
Previously buildProjectMemoriesSection(`${sTitle} ${tTitle}`) sent
two short strings to the cosine ranker — too sparse for re-ranking
to do meaningful work against the static pool.

buildMemoryRetrievalQuery() (new, exported for tests) enriches the
query with:

  - slice.title + task.title       (original signal)
  - slice.goal text, front 600 chars
                                   (the WHY of the slice — usually
                                   names the memory-relevant
                                   context the title can't fit)
  - top 20 changed files from
    git diff/status                (the WHAT — what code is in
                                   play right now; lets cosine
                                   ranking promote memories whose
                                   content references those paths)

Fail-open at each source: DB closed → no goal; not a git repo →
no files; nullish title args don't poison the string. The call
site never has to handle errors.

Bounded so embedding token cost stays predictable: 600-char goal
cap, 20-file cap. Empty inputs collapse to "" so the consumer's
`if (!query.trim())` branch still picks the static fallback.

Tests: 5 cases — titles always present, non-git directory safe,
empty-input collapse, nullish-arg defensiveness, real git repo
surfaces changed file paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:36:15 +02:00
Mikael Hugo
362af3d6a4 fix(headless): bypass rpc for status
Some checks failed
CI / detect-changes (push) Has been cancelled
CI / docs-check (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / build (push) Has been cancelled
CI / integration-tests (push) Has been cancelled
CI / windows-portability (push) Has been cancelled
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled
CI / rtk-portability (macos, macos-15) (push) Has been cancelled
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled
2026-05-15 17:32:21 +02:00
Mikael Hugo
cf32e79578 feat(memory-embeddings): read SF_LLM_GATEWAY_KEY from env as auth.json fallback
Enables CI and containerised deployments without writing secrets to disk.
Auth.json still takes precedence when present.

- readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var
- SF_LLM_GATEWAY_URL env var also supported for endpoint override
- Added tests for env fallback, auth.json preference, and default URL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 17:13:40 +02:00
Mikael Hugo
6214f7c86d feat(memory): add extraction diagnostics 2026-05-15 16:53:01 +02:00
Mikael Hugo
fdc4650016 feat(self-feedback-drain): filter free opencode models from triage routing
Self-feedback triage routing was including paid opencode models even
when the operator policy prefers the free tier. Add
isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the
candidate list before the router scores them.

Also: cosmetic — quote style normalised by the formatter on
buildInlineFixPrompt strings and spawn options object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:24 +02:00
Mikael Hugo
3a14fe86a7 test(list-models): isolate from developer's discovery-cache
Tests were picking up the developer's real
~/.sf/agent/discovery-cache.json and seeing unexpected models in
output. Pin tests to a guaranteed-missing path via the new
_discoveryCacheFilePath option so the env they observe is solely
what the test constructs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:11 +02:00
Mikael Hugo
d8f56e6704 feat(cli): add sf key subcommand for auth.json management
Surgical read/write access to ~/.sf/agent/auth.json without touching
the file directly. All mutations go through AuthStorage so file-lock
and chmod-600 invariants are always respected.

  sf key set    <provider> <api-key>   add/rotate stored key
  sf key get    <provider>             show masked key (last 4 chars)
  sf key remove <provider> [--yes]     remove credential
  sf key list                          list all providers + status

Rationale: SF's source of truth for credentials is auth.json at
runtime — env vars are only used during initial one-time provider
setup. Rotation needs an explicit, audit-friendly path, not implicit
env-driven re-reads. Keys are never echoed in full (last 4 chars
only); remove always prompts unless --yes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:04 +02:00
Mikael Hugo
351bfad41d fix(memory): extractTranscriptFromActivity now reads custom_message entries
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"`
for assistant reasoning content. The old code only checked `role === "assistant"`,
so every transcript was empty → extraction silently skipped every unit.

Fix: recognise both legacy (`role === "assistant"`) and modern
(`custom_message` with `sf-*` prefix) entry shapes. Also reads the
standalone `text` field used by custom messages.

This is why memory_processed_units had 0 rows despite 34 activity logs.

Tests: 186 files / 1994 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:13:26 +02:00
Mikael Hugo
7ba469cff1 feat(memory): add debug logging to memory extraction pipeline
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.

Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry

This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.

Tests: 185 files / 1993 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:09:36 +02:00
Mikael Hugo
ba4b2d46d9 sf snapshot: uncommitted changes after 43m inactivity 2026-05-15 15:53:19 +02:00
Mikael Hugo
0b19afebf6 test(providers): expand discovery test matrix to 46 cases
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds full coverage for the discovery-gating root cause that was
fixed in commits d70d8d3b1 (xiaomi x-api-key auth) and the
subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry
work in model-catalog-cache.js.

Diagnosis recap: kimi-coding, opencode, opencode-go were silent
in ~/.sf/agent/discovery-cache.json because the SDK's
model-discovery.js adapter registry marked them with
StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's
discoverModels() never attempted them. SF's own
scheduleModelCatalogRefresh DID fetch them but wrote only to the
per-repo runtime cache (basePath/.sf/model-catalog/) and only fired
on session_start — not during --discover. The fix is to mirror the
write to the SDK's discovery cache on both fetch-path AND cache-hit
path, and await it in cli.ts before listModels when --discover is set.

New test sections:
- parseDiscoveredModels: OpenAI {data}/{models} formats, Google
  {models[].name} prefix stripping, name-as-id fallback, null on
  bad input, OpenRouter pricing extraction
- refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer),
  opencode uses Bearer, no-key providers skipped, SDK discovery cache
  written on BOTH network-fetch and cache-hit paths, kimi-coding +
  opencode-go iterated when keys present

46 tests pass. No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:09:38 +02:00
Mikael Hugo
67c088410c chore(discovery): silence debug stderr from refresh path
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Trailing instrumentation from the discovery investigation. The error
catch still swallows non-fatal failures during --discover, just no
longer prints to stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:03:56 +02:00
Mikael Hugo
fe28a48d81 fix(sift): revert to bm25,phrase for repo-root — hang was corrupted cache
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The earlier commit (44fcfb643) incorrectly disabled phrase on repo-root
because I thought phrase retriever hung on full-workspace scope. After
clearing the corrupted cache (left by killing a mid-build vector process),
testing confirms:

- bm25 alone on repo root: works, 1m 50s cold, instant warm
- phrase alone on repo root: works after cache clear
- bm25+phrase on repo root: works after cache clear
- vector on scoped paths: works after cache build

The "hang" was from a corrupted/stale cache, not a sift bug.
.siftignore is properly excluding files (146K→2,660 indexed).

Revert chooseSiftRetrievers back to bm25,phrase for repo-root.

Tests: 184 files / 1974 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 14:59:45 +02:00
Mikael Hugo
b88b66c651 feat(auto): fan out swarm research units 2026-05-15 14:54:27 +02:00
Mikael Hugo
c8854ca896 feat(discovery): cache stores pricing — unblocks zero-cost-but-not-:free models
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Today's discovery cache stored only model IDs (string[]). Downstream
isZeroCost(model?.cost) check evaluated against undefined for any
dynamically-discovered model, so OpenRouter's zero-cost-but-not-:free
entries (owl-alpha, lyria-3-pro-preview, lyria-3-clip-preview,
openrouter/free) got silently blocked by the built-in provider policy.

Cache entry shape now: {id, cost?, contextWindow?} per model.
parseDiscoveredModels extracts pricing from OpenRouter's
/api/v1/models response (pricing.prompt/completion/input_cache_read/
input_cache_write → numeric cost.{input,output,cacheRead,cacheWrite}).
Other providers stay {id}-only — their /v1/models endpoints don't
ship pricing.

Migration: on first read of a legacy string[] cache, entries are
converted in-place to {id} objects and the file is rewritten. No cost
backfill (data wasn't there before), but the new readers handle them.

Cost wired into policy: isModelAllowedByBuiltInProviderPolicy calls
lookupDiscoveredModelCost("openrouter", modelId) as a fallback when
the static model registry has no cost data.

Plus: cli.ts --discover now eagerly refreshes SF-managed providers
(opencode, opencode-go, kimi-coding, xiaomi) that the SDK's adapter
doesn't cover — so they populate cache on first --discover instead
of waiting for a session-start lazy refresh.

Tests: 13 new across 5 groups (pricing extraction, round-trip, legacy
migration, policy gate happy/sad paths, Google provider compat).
Full suite: 184 files / 1971 tests, zero regressions.

Real-world result: openrouter/owl-alpha, google/lyria-3-pro-preview,
google/lyria-3-clip-preview, openrouter/free, plus any future
zero-cost models now pass the policy filter on the next discovery
refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:51:00 +02:00
Mikael Hugo
d70d8d3b10 fix(providers): use x-api-key for xiaomi discovery 2026-05-15 14:43:09 +02:00
Mikael Hugo
09ea553b6d fix(auto): initialize notification store during bootstrap 2026-05-15 14:42:02 +02:00
Mikael Hugo
0a332f4cba fix(headless): normalize auto alias to autonomous 2026-05-15 14:32:00 +02:00
Mikael Hugo
44fcfb643c fix(sift): use bm25 only for repo-root — phrase retriever hangs on full scope
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Root cause: the sift binary's phrase retriever hangs indefinitely when
queried against the full repo-root scope (57K+ files). Earlier tests
mistook this for a general slowness, but isolated testing confirms:

- bm25 alone on repo root: works (1m 30s cold, instant warm)
- phrase alone on repo root: hangs forever
- bm25+phrase on repo root: hangs forever (phrase path blocks)
- all retrievers on scoped subdirs: work correctly

The earlier Rust panic was from a corrupted cache state left by killing
a mid-build vector process. After clearing the cache, bm25 alone works.

Fix: chooseSiftRetrievers now returns retrievers: "bm25" (not "bm25,phrase")
for repo-root scope. Scoped subdirs still get bm25+phrase+vector with
position-aware reranking.

Tests: updated 3 assertions in sift-retriever-scope.test.mjs.
Full suite: 183 files / 1958 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 14:28:23 +02:00
Mikael Hugo
1b5348e28e feat(providers): live discovery for opencode, opencode-go, minimax
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Three providers were missing from PROVIDER_CATALOG_CONFIG so their
model lists couldn't be auto-discovered. Their wire ids only existed
in packages/ai/src/models.generated.ts as hand-coded entries, meaning
new model variants from these providers required manual catalog edits.

Verified live endpoints respond to /v1/models with bearer auth:
- opencode      → https://opencode.ai/zen/v1/models      (6 free models)
- opencode-go   → https://opencode.ai/zen/go/v1/models   (15 models)
- minimax       → https://api.minimax.io/v1/models       (works)

Added entries:
  opencode:     baseUrl https://opencode.ai/zen, modelsPath /v1/models
  opencode-go:  baseUrl https://opencode.ai/zen/go, modelsPath /v1/models
  minimax:      baseUrl https://api.minimax.io, modelsPath /v1/models
                (international endpoint; Chinese-network api.minimaxi.com
                still handled separately in the SDK)

Auth keys already wired: OPENCODE_API_KEY, OPENCODE_GO_API_KEY (with
OPENCODE_API_KEY fallback), MINIMAX_API_KEY. No env-api-keys.ts changes.

Combined with 385e0b448 (dynamic canonicalIdFor resolver), new model
variants from these three providers will be auto-grouped in
.sf/model-performance.json without hand-editing CANONICAL_BY_ROUTE.

Live counts after fresh discovery will reveal experimental models
absent from static catalog (e.g. opencode's "big-pickle", opencode-go's
deepseek-v4-pro, mimo-v2.5-pro, hy3-preview). The model-router
tolerates unconventional wire IDs — no naming constraints.

To populate cache: rm -rf ~/.sf/runtime/model-catalog/ + relaunch sf.

Tests: 13 new in provider-catalog-discovery.test.mjs (catalog shape,
modelsPath presence, DISCOVERABLE_PROVIDER_IDS inclusion). Full suite
183 files / 1940 tests pass, zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:19:08 +02:00
Mikael Hugo
db3525b933 chore(model-registry): prune 15 redundant identity-strip aliases
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
After 385e0b448 added the dynamic discovery-cache resolver to
canonicalIdFor, the 15 identity-strip aliases added in 089bf0cbe for
discovered providers became pure redundancy — the dynamic path
returns the same bare modelId from the discovery cache.

Removed (all canonical == bare modelId, all providers in discovery cache):
- minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed
- mistral/codestral-latest, mistral/devstral-2512,
  mistral/devstral-small-2507, mistral/mistral-large-latest,
  mistral/mistral-medium-latest, mistral/mistral-small-latest
- zai/glm-4.5, zai/glm-4.5-air, zai/glm-4.6, zai/glm-4.7,
  zai/glm-5, zai/glm-5-turbo, zai/glm-5.1

Kept (real aliases — canonical differs from wire id, NOT identity strips):
- kimi-coding/kimi-for-coding → kimi-k2.6 (Moonshot alias)
- mistral/devstral-medium-2507 → devstral-medium-latest (alias to latest)
- minimax/MiniMax-M2 family lowercase mappings (case-change aliases)

Also kept:
- zai/glm-4.5-flash, zai/glm-4.7-flash (not yet in discovery cache;
  flash variants may launch before cache refresh — fast-path safety)
- kimi-coding/kimi-k2.6 + kimi-k2-thinking (kimi-coding cache only
  has kimi-for-coding; these resolve via _ENTRY_BY_ROUTE fallback)

Tests: 15 new regression tests in canonical-id-dynamic.test.mjs verify
each removed entry STILL resolves correctly via dynamic discovery.
Total 21/21 in that file, plus 101 model-registry tests, plus 16
canonical-id-mapping tests — all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:17:06 +02:00
Mikael Hugo
385e0b4480 feat(model-learner): canonicalIdFor consults discovery cache as fallback
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
After commit 089bf0cbe added 23 hand-written aliases for production
route keys, the right structural fix is to also consult the dynamic
model-discovery cache (~/.sf/agent/discovery-cache.json). Otherwise
every new model variant from a discovered provider (ollama-cloud +39
models, openrouter +24, etc.) requires another round of hand-editing.

canonicalIdFor now resolves in this order:
  1. CANONICAL_BY_ROUTE (static fast path, retains real aliases like
     kimi-coding/kimi-for-coding → kimi-k2.6 where canonical differs)
  2. _ENTRY_BY_ROUTE (existing static path)
  3. canonicalIdFromDiscovery — reads ~/.sf/agent/discovery-cache.json,
     finds (provider, modelId) pair, returns bare modelId

In-memory cache with 60s TTL (DISCOVERY_CACHE_TTL_MS) so the readFileSync
on the hot path becomes one disk read per minute at most. canonicalIdFor
is per-dispatch, not per-token, so the overhead is negligible.

Test hook __setDiscoveryCacheForTest lets vitest inject a cache without
touching the fs.

Tests: 6 new in canonical-id-dynamic.test.mjs (dynamic hit, static-alias
wins over dynamic, cache miss → null, null cache graceful, missing-models
graceful, multiple models per provider). Combined with existing
canonical-id-mapping: 22/22 pass. Full suite 1912 pass, no regressions.

Sanity verified: canonicalIdFor("ollama-cloud/glm-5.1") → "glm-5.1"
(dynamic-only, not in static table); canonicalIdFor("unknown/never")
→ null.

Follow-up (in flight, separate agent): prune the static identity-strip
aliases from CANONICAL_BY_ROUTE for providers in the discovery cache
since they're now redundant with the dynamic resolver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:14:04 +02:00
Mikael Hugo
2a58f4ebec feat(model-routing): autonomous fallback strict to enabledModels allowlist
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Autonomous mode's model-fallback chain bypassed enabledModels — when zai
429'd, the chain happily fell through to mistral/codestral-latest even
though only minimax/*, kimi-coding/*, zai/*, ollama-cloud/* were allowed.
Of 52 dispatches in this repo's journal this session, 10 (~19%)
escaped the allowlist (mistral×2, opencode-go×3, google-gemini-cli×5).

enabledModels was honored by interactive cycling (settings-manager.ts)
and by self-feedback-drain.js for triage routing, but
auto-model-selection.js's fallback chain in selectAndApplyModel never
read it.

Now: isModelInEnabledList(provider, modelId, enabledModels) filters
each fallback candidate. Supports exact "provider/model" or
"provider/*" wildcard. Empty/undefined list = open behavior (no
regression for setups without an allowlist).

readEnabledModels reads ~/.sf/agent/settings.json once per chain;
swallows IO errors → undefined → no constraint (safe failure mode).

Escape hatch: SF_BYPASS_ENABLED_MODELS=1 disables the check for
emergency / misconfigured cases.

When ALL candidates are filtered out and the chain exhausts, throws
a clear error directing the operator to add to allowlist or unset.

Tests: 13 in enabled-models-fallback.test.mjs covering pattern matrix,
multi-candidate chain skipping, bypass env, and exhaustion path.
Full suite 1906 pass, no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:02:58 +02:00
Mikael Hugo
089bf0cbeb fix(model-learner): resolve canonical-id lazy-load race + 23 wire-id aliases
Of 52 dispatches in this repo's journal this session, 51 landed in
.sf/model-performance.json's _unmapped bucket — meaning the live-outcome
learner couldn't tell which provider/model succeeded or failed. Only
1 dispatch (google-gemini-cli/gemini-3-flash-preview) bucketed correctly.

Root cause was NOT just missing aliases — it was a lazy-load race:
- model-learner.js declared canonicalIdFor as a fire-and-forget dynamic
  import side-effect at module bottom
- metrics.js called recordOutcome() synchronously after
  `await import("./model-learner.js")` resolved — before the registry
  injection promise settled
- Result: _canonicalIdForFn was null for the first dispatch every session.
  Every session. Since the file shipped.

Why nobody noticed: _unmapped is a bucket, not an error. No throw, no
warning, no UI surface. Selection still worked because benchmark-selector
+ static hand-tuned scores carry the routing decision. Only the
feedback loop (recordOutcome → adjust scores) was silently severed.

Fix:
- model-learner.js: export `registryReady` promise instead of swallowing it
- metrics.js: await registryReady before recordOutcome()
- model-registry.ts: 23 new CANONICAL_BY_ROUTE entries covering the actual
  production fallback chain — zai/glm-4.5{-air,-flash,5,5.1,5-turbo,4.6,4.7,4.7-flash},
  mistral/codestral-latest + devstral-2512 + devstral-{small,medium}-* +
  mistral-{large,medium,small}-latest, google-gemini-cli/gemini-{2.5-pro,3-flash-preview,3.1-pro-preview},
  opencode-go/{glm-5,glm-5.1,mimo-v2-omni,mimo-v2-pro}

Also adds opt-in backfillModelPerformanceFromJournal(basePath) to
reclassify the existing 51 _unmapped records from past journal events.
Never auto-runs; backs up the old file before overwriting.

Tests: 16 in canonical-id-mapping.test.mjs covering pattern matching,
non-mappable cases, bare canonical-id passthrough, and the backfill
path. Full suite 1906 pass, no regressions.

Known follow-up: CANONICAL_BY_ROUTE uses mixed casing (MiniMax-M2.7 vs
minimax-m2) — should be standardized lowercase in a future pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:02:58 +02:00
Mikael Hugo
5f92320c7d fix(auto): timeout silent swarm turns despite heartbeats 2026-05-15 13:55:04 +02:00
Mikael Hugo
85f6650852 fix(auto): keep solver checkpoint pass out of swarm 2026-05-15 13:35:20 +02:00