singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	725affd126	feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71) SF is a purpose-to-software compiler — every self_feedback row must name the milestone vision or slice goal it's filed against, so triage can prioritize against purpose rather than treating each row as floating. - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT. NULL allowed for legacy rows; fresh-DB CREATE includes the column. - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose}) pushes a LIKE %fragment% filter into the DB layer so triage doesn't have to pull the full table. - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON projection for legacy rows where the column is NULL. - headless-feedback CLI: `feedback add --purpose <fragment>` persists the anchor; `feedback list --purpose <fragment>` filters by it. Omission stays valid — restoration is additive, not breaking. - help-text + migration test updated; new vitest covers add/list round-trip, NULL-on-omit legacy compat, substring match, and the help-text documentation contract. Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md: "non-trivial artifacts must name their purpose and consumer." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:51:52 +02:00
Mikael Hugo	3c416162b2	feat(sf-db): record per-task purpose_trace at complete_task (ADR-0000) Restore the purpose-to-software doctrine at the slice gate: every task the executor closes must name the slice-goal sentence or clause it served. complete-slice now refuses to flip a slice to complete while any of its tasks has a NULL purpose_trace, making "did all tasks actually serve the slice goal" a mechanical check instead of a vibe. Schema migration v70 adds a nullable purpose_trace TEXT to tasks (legacy rows stay valid). complete_task refuses without it and quotes slice.goal in the error so the agent can anchor. insertTask / updateTaskStatus accept the new field, rowToTask exposes it, and a new updateTaskPurposeTrace helper covers later corrections. Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:50:27 +02:00
Mikael Hugo	fa657f2523	feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2) Restoration of doctrine: plan-milestone now emits a literal milestone.vision clause per slice (traces_vision_fragment) so validate-milestone has structured grounds for assessment instead of re-reading the vision through the LLM every time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and plan_milestone tool start requiring it for new slices, rejecting fragments that do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:48:50 +02:00
Mikael Hugo	3b83f0898b	merge(P4): purpose-coherence-gate before every dispatch (ADR-0000)	2026-05-15 18:45:17 +02:00
Mikael Hugo	5e2c7a7166	merge(P1): vision quality gate on sf new-milestone (ADR-0000)	2026-05-15 18:45:08 +02:00
Mikael Hugo	59fbaf4b0f	test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers Cleans up doctor-purpose-gate.test.mjs: - Uses insertMilestone/insertSlice helpers instead of raw SQL - Removes redundant test from doctor-plan-dir-normalization.test.mjs - Adds module-level JSDoc purpose comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:44:09 +02:00
Mikael Hugo	a303b5db29	feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000) Restores the eight-PDD purpose gate at the autonomous-loop boundary required by ADR-0000 (SF is a purpose-to-software compiler). The gate walks milestone vision -> slice.traces_vision_fragment -> task.purpose_trace before every dispatch and refuses to proceed when the purpose chain is broken at the vision root (degraded-vision). - New uok/purpose-coherence.js with a pure verdict function and a DB-backed adapter. Reads vision/trace columns directly via SQL so pre-P2/P3 schema migrations are tolerated. - Wired into auto/phases-pre-dispatch.js alongside resource-version- guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on every pre-dispatch turn and emits to the existing trace JSONL. - Outcome ladder: fail (vision missing -> pause loop), warn (trace columns missing or NULL -> surface but allow dispatch so legacy DBs don't hard-break on day one), pass (full chain). - Tests in tests/uok-purpose-coherence.test.mjs cover the four contracted states plus the column-missing downgrade path on a pre-migration schema. Refs: ADR-0000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:42:55 +02:00
Mikael Hugo	fb68b12902	feat(doctor): enforce ADR-0000 purpose gate — milestones need vision, slices need goal Adds two new doctor checks to checkEngineHealth(): - db_milestone_missing_vision: error when a milestone has no vision (the WHY/purpose field per ADR-0000) - db_slice_missing_goal: error when a slice has no goal (the WHAT/purpose field per ADR-0000) Both checks are non-fixable (the operator must define purpose). This aligns with ADR-0000 §Enforcement: "Non-trivial milestones, slices, tasks, ADRs, specs, tests, and exported symbols must name their purpose and consumer." Tests: 2 cases — milestone without vision flagged, slice without goal flagged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:42:14 +02:00
Mikael Hugo	aa0d57371e	feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone Restoration of forgotten doctrine: ADR-0000 declares the eight PDD fields (Purpose, Consumer, Contract, Failure boundary, Evidence, Non-goals, Invariants, Assumptions) the purpose gate, but `sf headless new-milestone --context <file>` was accepting any context including empty or trivially-thin seed docs. This wires a pre-create check that refuses the run when fields are missing or too thin, naming exactly which ones so the operator can fix the seed doc and retry. - new src/resources/extensions/sf/headless-pdd-check.js: scans context for the eight fields (heading and inline-label forms) and reports missing/sparse, plus a minimum-spine check (Purpose + Consumer + Contract + Evidence-or-Falsifier). - src/headless.ts calls the check after loadContext, before bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text. - --skip-pdd-check is the migration escape hatch (warning printed, PDD gate bypassed) for milestones that pre-date the gate. - SF-internal auto-bootstrap (autonomous→new-milestone fallback) is exempted because the seed is SF-generated, not operator-PDD. - vitest test covers missing-Purpose, missing-Consumer, all-8, sparse, inline-label form, Falsifier-as-Evidence spine, and the doctrine field order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:37:06 +02:00
Mikael Hugo	af1401e4ea	fix(solver): enforce PDD purpose gate	2026-05-15 18:32:59 +02:00
Mikael Hugo	bb0c87fdac	feat(remediation-dispatcher): M003 S04 — autonomous recovery from validation findings Implements RemediationDispatcher that classifies verification failures and maps them to recovery strategies: - transient → retry (timeout, flaky test, network) - structural → replan (broken import, syntax error) - knowledge → research (not implemented, missing context) - infra → escalate via self-feedback (tooling broken) Confidence scoring: - Single failing check + known pattern = high confidence - Multiple failures or high retry count = lower confidence - Configurable autoFixThreshold (default 0.6) 15 unit tests covering all 4 failure classes + confidence scoring + threshold behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:29:45 +02:00
Mikael Hugo	a863672463	fix(state): honor completed owning requirements	2026-05-15 18:24:16 +02:00
Mikael Hugo	b08cb13c20	fix(state): requirements-complete short-circuits the planning ladder Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05, PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but the milestone row was still active because its only slice was a post-migration skipped placeholder. After the previous fix routed all-skipped milestones to pre-planning, SF ran roadmap-meeting + plan-milestone and wrote 3 new slices on a milestone whose contract-level work was already done — burned ~4 LLM turns on plausibly-adjacent but unwanted re-decomposition. Root cause: deriveStateFromDb's milestone-completion gate consults only slice statuses (and indirectly the milestone row's own status field). It never reads REQUIREMENTS.md to check whether the contract is already satisfied. The slice-based view collapsed the real signal. Fix: - New parseRequirementsByMilestone(content) helper in files.js: parses REQUIREMENTS.md, groups entries by their `Primary owning milestone` field, returns Map<id, {complete, incomplete}>. - handleAllSlicesDone now reads REQUIREMENTS.md before its slice-based real-work check. If a milestone has at least one owning requirement and zero of them are incomplete, route to completing-milestone with nextAction naming the requirement count (so the operator can see why the milestone is being closed without manually opening REQUIREMENTS.md). - Best-effort: REQUIREMENTS.md parse failure falls through to the existing slice-based rule. Missing file likewise — no regression for projects that don't keep a requirements file. Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI. End-to-end verified by re-running sf headless query on dr-repo M003: now reports phase=completing-milestone with the right requirement-count message. Tests: 5 new cases — all complete + slice skipped → completing, some active → pre-planning, zero owning requirements falls through, missing file falls through, all complete + real slice work still completes. Existing 4 all-skipped-replan cases still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:22:28 +02:00
Mikael Hugo	6a2c61d5ee	feat(halt-self-feedback): M003 S03 — HaltWatchdog self-feedback integration T01: Added integration test auto-halt-self-feedback.test.mjs that proves: - HaltWatchdog.check() creates a self-feedback DB entry with kind=runaway-loop:idle-halt, severity=high, blocking=true - Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated - Deduplication works (one entry per idle period) - New heartbeat resets and creates a new entry for the next idle period T02: Enhanced evidence string to include elapsedMs, iteration, and thresholdMs explicitly (R003 actionable context requirement). Tests: 36/36 pass across auto-halt-self-feedback, auto-halt-watchdog-notify, and self-feedback-db suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:19:39 +02:00
Mikael Hugo	5cd5e14160	feat(headless): surface memory auth pause	2026-05-15 18:16:08 +02:00
Mikael Hugo	90b8e7edf8	feat(headless): expose memory extraction diagnostics	2026-05-15 18:13:35 +02:00
Mikael Hugo	f00762ffdb	fix(feedback): allow restamping suspect resolutions	2026-05-15 18:10:14 +02:00
Mikael Hugo	820f9aaf8e	fix(memory): classify extraction failures	2026-05-15 18:06:33 +02:00
Mikael Hugo	7dfb5099c9	fix(state): all-skipped milestone routes to pre-planning, not validating handleAllSlicesDone treated isStatusDone uniformly — "complete", "done", AND "skipped" all counted as "milestone work is finished", so a milestone whose only slice was skipped would advance to phase=validating-milestone. That's wrong: a placeholder slice that was skipped doesn't validate the milestone's success criteria, it just clears the wedge. Surfaced concretely in dr-repo M003 (Unified Dashboard + Pilot Validation): I skipped the migration placeholder via the new `sf headless skip-slice` CLI, and the next-dispatch reported `validate-milestone M003` even though no real work had happened on the milestone. The autonomous loop would then burn an LLM turn running validate-milestone just to discover the obvious gap. Fix: differentiate {complete, done} from {skipped} at the gate. When zero slices carry real-work outcomes, route into the pre-planning phase so the dispatcher's existing discuss → research → plan ladder takes over. The PDD/vision is already in the milestone row, so the planner has the purpose it needs without operator hand-holding. Verified end-to-end against dr-repo: `sf headless query` for M003 now reports phase=pre-planning and next dispatch `roadmap-meeting M003` (the deep-planning entry rule fires first; discuss/research/plan come after as artifacts land). Tests: 4 cases — all-skipped → pre-planning, complete+skipped mix → validating, legacy "done" alias → validating, multiple skipped → pre-planning. Resolves sf-mp73sk0m-63w88y (filed via headless feedback CLI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:00:21 +02:00
Mikael Hugo	881fd5e304	feat(memory,state): runtime counters for memory injection + milestone work validation Memory injection telemetry: - Move counter writes from auto-prompts.js to memory-store.js (where getRelevantMemoriesRanked/getActiveMemoriesRanked actually fire). - Track memory_inject_count and memory_inject_chars_total via runtime_counters table for headless-query reporting. State-db validation: - handleAllSlicesDone now checks if any slice carries real work (status=complete/done) before routing to validation. - Milestones with all-skipped slices route to "reassess-roadmap" instead of asking the operator to validate non-existent work. SM client defense: - Filter foreign-tenant memories from SM query responses even when the server returns them (defense-in-depth). Tests updated: memory-extraction-lifecycle, sf-db-migration, headless-query-memory-injection, sm-client, memory-tenant-gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 17:57:45 +02:00
Mikael Hugo	9abbfaada2	feat(memory-tenant-gate): add project-scoped isolation for SM cross-project recall Closes sf-mp723nju-2cpeoc. When SM_ENABLED is on, memory retrieval from Singularity Memory is now scoped to the current project's repoIdentity tenant. Foreign-tenant memories are filtered client-side and the tenant filter is sent server-side for SM servers that support it. Key changes: - schema v68: ADD COLUMN tenant TEXT on memories table (NULL = legacy) - insertMemoryRow: persists tenant field on every new record - backfillMemoryTenants / backfillMemoryTenantRows: idempotent migration called on session_start when SM_ENABLED is set - querySmMemories: resolves effectiveTenantId (opts.tenant > opts.tenantId > SM_TENANT_ID); returns [] when no tenant resolved and crossTenant off - SM_CROSS_TENANT_ENABLED=1 opt-in bypass with audit warning in console - register-hooks session_start: calls backfillMemoryTenants when SM active - 12 new tests in memory-tenant-gate.test.mjs; updated sm-client.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:55:26 +02:00
Mikael Hugo	ff333ae067	feat(memory): surface injection token cost in headless query The Project Memories section is rendered into every execute-task, plan-slice, and research-slice prompt. At 10 memories × ~200 chars each that's ~2K chars/turn injected into the context — real cost, no operator-visible meter. Adds two runtime_counters (already-existing key/value store): memory_inject_chars_total — cumulative section size memory_inject_count — number of injections Written by buildProjectMemoriesSection() on every render. Both writes sit inside a try/catch so a legacy DB without runtime_counters silently skips rather than blocking prompt build. `sf headless query` surfaces the cumulative + derived metrics as a new top-level `memoryInjection` block: { total_chars: 12480, count: 8, avg_chars: 1560, estimated_total_tokens: 3120 } The block is omitted entirely when count is 0 (fresh project / no prompts rendered yet) so it doesn't clutter the snapshot. Operators can now correlate prompt size growth against autonomous run cost without instrumenting the LLM call sites directly. The estimated_total_tokens is chars/4 — a rough approximation since SF doesn't tokenise the section, intentionally documented as such. Resolves sf-mp723yl9-rcxoeh filed via the headless feedback CLI. Tests: 5 source-level invariants — type carries the section, query reads counters by name, snapshot omits section on zero, write side calls both counter functions, write is wrapped in try/catch with documented failure-mode comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:55:14 +02:00
Mikael Hugo	671b2c8628	feat(sm-client): defense-in-depth tenant filter on SM responses Even though querySmMemories pins tenantId in the request body sent to the Singularity Memory server, SF used to accept whatever came back without verifying. A misconfigured or compromised SM server could echo memories from other tenants and SF would inject them into the next execute-task prompt — cross-customer leak. filterSmMemoriesToTenant() now re-checks every returned memory: - same-tenant memories pass through - foreign-tenant memories (memory.tenantId OR memory.tenant != expectedTenantId) are dropped, with a one-line warning so the misconfigured-SM symptom is visible rather than silent - memories with no tenant claim at all default to allow — matches the local DB's "NULL tenant = legacy row" rule from schema v68 - SM_REQUIRE_TENANT_CLAIM=true flips the legacy rule to drop (hard fail-closed mode for operators who want it) Defensive guards against non-array inputs, missing expectedTenantId (returns input unchanged so caller-side fail-open semantics are preserved), and the dual tenantId/tenant field naming. Tests: 8 cases — same-tenant pass-through, foreign drop, legacy allow, strict mode drop, tenantId/tenant alias, empty/non-array defensiveness, missing-expected pass-through, warning emission. Resolves the cross-project tenant-leak feedback row filed via the new headless feedback CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:49:24 +02:00
Mikael Hugo	7c3d9bd3bf	fix(memory): gate SM recall by tenant scope	2026-05-15 17:46:51 +02:00
Mikael Hugo	0c7aaafa00	feat(memory): enrich execute-task memory retrieval query Previously buildProjectMemoriesSection(`${sTitle} ${tTitle}`) sent two short strings to the cosine ranker — too sparse for re-ranking to do meaningful work against the static pool. buildMemoryRetrievalQuery() (new, exported for tests) enriches the query with: - slice.title + task.title (original signal) - slice.goal text, front 600 chars (the WHY of the slice — usually names the memory-relevant context the title can't fit) - top 20 changed files from git diff/status (the WHAT — what code is in play right now; lets cosine ranking promote memories whose content references those paths) Fail-open at each source: DB closed → no goal; not a git repo → no files; nullish title args don't poison the string. The call site never has to handle errors. Bounded so embedding token cost stays predictable: 600-char goal cap, 20-file cap. Empty inputs collapse to "" so the consumer's `if (!query.trim())` branch still picks the static fallback. Tests: 5 cases — titles always present, non-git directory safe, empty-input collapse, nullish-arg defensiveness, real git repo surfaces changed file paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:36:15 +02:00
Mikael Hugo	362af3d6a4	fix(headless): bypass rpc for status Some checks failed CI / detect-changes (push) Has been cancelled Details CI / docs-check (push) Has been cancelled Details CI / lint (push) Has been cancelled Details CI / build (push) Has been cancelled Details CI / integration-tests (push) Has been cancelled Details CI / windows-portability (push) Has been cancelled Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled Details CI / rtk-portability (macos, macos-15) (push) Has been cancelled Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled Details	2026-05-15 17:32:21 +02:00
Mikael Hugo	cf32e79578	feat(memory-embeddings): read SF_LLM_GATEWAY_KEY from env as auth.json fallback Enables CI and containerised deployments without writing secrets to disk. Auth.json still takes precedence when present. - readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var - SF_LLM_GATEWAY_URL env var also supported for endpoint override - Added tests for env fallback, auth.json preference, and default URL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 17:13:40 +02:00
Mikael Hugo	6214f7c86d	feat(memory): add extraction diagnostics	2026-05-15 16:53:01 +02:00
Mikael Hugo	fdc4650016	feat(self-feedback-drain): filter free opencode models from triage routing Self-feedback triage routing was including paid opencode models even when the operator policy prefers the free tier. Add isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the candidate list before the router scores them. Also: cosmetic — quote style normalised by the formatter on buildInlineFixPrompt strings and spawn options object. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:24 +02:00
Mikael Hugo	3a14fe86a7	test(list-models): isolate from developer's discovery-cache Tests were picking up the developer's real ~/.sf/agent/discovery-cache.json and seeing unexpected models in output. Pin tests to a guaranteed-missing path via the new _discoveryCacheFilePath option so the env they observe is solely what the test constructs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:11 +02:00
Mikael Hugo	d8f56e6704	feat(cli): add sf key subcommand for auth.json management Surgical read/write access to ~/.sf/agent/auth.json without touching the file directly. All mutations go through AuthStorage so file-lock and chmod-600 invariants are always respected. sf key set <provider> <api-key> add/rotate stored key sf key get <provider> show masked key (last 4 chars) sf key remove <provider> [--yes] remove credential sf key list list all providers + status Rationale: SF's source of truth for credentials is auth.json at runtime — env vars are only used during initial one-time provider setup. Rotation needs an explicit, audit-friendly path, not implicit env-driven re-reads. Keys are never echoed in full (last 4 chars only); remove always prompts unless --yes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:04 +02:00
Mikael Hugo	351bfad41d	fix(memory): extractTranscriptFromActivity now reads custom_message entries Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"` for assistant reasoning content. The old code only checked `role === "assistant"`, so every transcript was empty → extraction silently skipped every unit. Fix: recognise both legacy (`role === "assistant"`) and modern (`custom_message` with `sf-*` prefix) entry shapes. Also reads the standalone `text` field used by custom messages. This is why memory_processed_units had 0 rows despite 34 activity logs. Tests: 186 files / 1994 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 16:13:26 +02:00
Mikael Hugo	7ba469cff1	feat(memory): add debug logging to memory extraction pipeline Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The memory extraction system has infrastructure (DB tables, LLM prompts, unit closeout wiring, embedding backfill) but zero processed units and only self-feedback-resolution memories. This suggests extraction is failing silently. Add debugLog() calls throughout extractMemoriesFromUnit() so we can observe: - Skip reasons (mutex busy, rate limited, already processed, file too small) - Start/done lifecycle per unit - LLM call and parse outcomes - Error messages on failure and retry This makes the extraction pipeline observable via --debug or the journal/debug log without changing behavior. Tests: 185 files / 1993 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 16:09:36 +02:00
Mikael Hugo	ba4b2d46d9	sf snapshot: uncommitted changes after 43m inactivity	2026-05-15 15:53:19 +02:00
Mikael Hugo	0b19afebf6	test(providers): expand discovery test matrix to 46 cases Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Adds full coverage for the discovery-gating root cause that was fixed in commits `d70d8d3b1` (xiaomi x-api-key auth) and the subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry work in model-catalog-cache.js. Diagnosis recap: kimi-coding, opencode, opencode-go were silent in ~/.sf/agent/discovery-cache.json because the SDK's model-discovery.js adapter registry marked them with StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's discoverModels() never attempted them. SF's own scheduleModelCatalogRefresh DID fetch them but wrote only to the per-repo runtime cache (basePath/.sf/model-catalog/) and only fired on session_start — not during --discover. The fix is to mirror the write to the SDK's discovery cache on both fetch-path AND cache-hit path, and await it in cli.ts before listModels when --discover is set. New test sections: - parseDiscoveredModels: OpenAI {data}/{models} formats, Google {models[].name} prefix stripping, name-as-id fallback, null on bad input, OpenRouter pricing extraction - refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer), opencode uses Bearer, no-key providers skipped, SDK discovery cache written on BOTH network-fetch and cache-hit paths, kimi-coding + opencode-go iterated when keys present 46 tests pass. No regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:09:38 +02:00
Mikael Hugo	67c088410c	chore(discovery): silence debug stderr from refresh path Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Trailing instrumentation from the discovery investigation. The error catch still swallows non-fatal failures during --discover, just no longer prints to stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:03:56 +02:00
Mikael Hugo	fe28a48d81	fix(sift): revert to bm25,phrase for repo-root — hang was corrupted cache Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The earlier commit (`44fcfb643`) incorrectly disabled phrase on repo-root because I thought phrase retriever hung on full-workspace scope. After clearing the corrupted cache (left by killing a mid-build vector process), testing confirms: - bm25 alone on repo root: works, 1m 50s cold, instant warm - phrase alone on repo root: works after cache clear - bm25+phrase on repo root: works after cache clear - vector on scoped paths: works after cache build The "hang" was from a corrupted/stale cache, not a sift bug. .siftignore is properly excluding files (146K→2,660 indexed). Revert chooseSiftRetrievers back to bm25,phrase for repo-root. Tests: 184 files / 1974 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 14:59:45 +02:00
Mikael Hugo	b88b66c651	feat(auto): fan out swarm research units	2026-05-15 14:54:27 +02:00
Mikael Hugo	c8854ca896	feat(discovery): cache stores pricing — unblocks zero-cost-but-not-:free models Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Today's discovery cache stored only model IDs (string[]). Downstream isZeroCost(model?.cost) check evaluated against undefined for any dynamically-discovered model, so OpenRouter's zero-cost-but-not-:free entries (owl-alpha, lyria-3-pro-preview, lyria-3-clip-preview, openrouter/free) got silently blocked by the built-in provider policy. Cache entry shape now: {id, cost?, contextWindow?} per model. parseDiscoveredModels extracts pricing from OpenRouter's /api/v1/models response (pricing.prompt/completion/input_cache_read/ input_cache_write → numeric cost.{input,output,cacheRead,cacheWrite}). Other providers stay {id}-only — their /v1/models endpoints don't ship pricing. Migration: on first read of a legacy string[] cache, entries are converted in-place to {id} objects and the file is rewritten. No cost backfill (data wasn't there before), but the new readers handle them. Cost wired into policy: isModelAllowedByBuiltInProviderPolicy calls lookupDiscoveredModelCost("openrouter", modelId) as a fallback when the static model registry has no cost data. Plus: cli.ts --discover now eagerly refreshes SF-managed providers (opencode, opencode-go, kimi-coding, xiaomi) that the SDK's adapter doesn't cover — so they populate cache on first --discover instead of waiting for a session-start lazy refresh. Tests: 13 new across 5 groups (pricing extraction, round-trip, legacy migration, policy gate happy/sad paths, Google provider compat). Full suite: 184 files / 1971 tests, zero regressions. Real-world result: openrouter/owl-alpha, google/lyria-3-pro-preview, google/lyria-3-clip-preview, openrouter/free, plus any future zero-cost models now pass the policy filter on the next discovery refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:51:00 +02:00
Mikael Hugo	d70d8d3b10	fix(providers): use x-api-key for xiaomi discovery	2026-05-15 14:43:09 +02:00
Mikael Hugo	09ea553b6d	fix(auto): initialize notification store during bootstrap	2026-05-15 14:42:02 +02:00
Mikael Hugo	0a332f4cba	fix(headless): normalize auto alias to autonomous	2026-05-15 14:32:00 +02:00
Mikael Hugo	44fcfb643c	fix(sift): use bm25 only for repo-root — phrase retriever hangs on full scope Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Root cause: the sift binary's phrase retriever hangs indefinitely when queried against the full repo-root scope (57K+ files). Earlier tests mistook this for a general slowness, but isolated testing confirms: - bm25 alone on repo root: works (1m 30s cold, instant warm) - phrase alone on repo root: hangs forever - bm25+phrase on repo root: hangs forever (phrase path blocks) - all retrievers on scoped subdirs: work correctly The earlier Rust panic was from a corrupted cache state left by killing a mid-build vector process. After clearing the cache, bm25 alone works. Fix: chooseSiftRetrievers now returns retrievers: "bm25" (not "bm25,phrase") for repo-root scope. Scoped subdirs still get bm25+phrase+vector with position-aware reranking. Tests: updated 3 assertions in sift-retriever-scope.test.mjs. Full suite: 183 files / 1958 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 14:28:23 +02:00
Mikael Hugo	1b5348e28e	feat(providers): live discovery for opencode, opencode-go, minimax Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Three providers were missing from PROVIDER_CATALOG_CONFIG so their model lists couldn't be auto-discovered. Their wire ids only existed in packages/ai/src/models.generated.ts as hand-coded entries, meaning new model variants from these providers required manual catalog edits. Verified live endpoints respond to /v1/models with bearer auth: - opencode → https://opencode.ai/zen/v1/models (6 free models) - opencode-go → https://opencode.ai/zen/go/v1/models (15 models) - minimax → https://api.minimax.io/v1/models (works) Added entries: opencode: baseUrl https://opencode.ai/zen, modelsPath /v1/models opencode-go: baseUrl https://opencode.ai/zen/go, modelsPath /v1/models minimax: baseUrl https://api.minimax.io, modelsPath /v1/models (international endpoint; Chinese-network api.minimaxi.com still handled separately in the SDK) Auth keys already wired: OPENCODE_API_KEY, OPENCODE_GO_API_KEY (with OPENCODE_API_KEY fallback), MINIMAX_API_KEY. No env-api-keys.ts changes. Combined with `385e0b448` (dynamic canonicalIdFor resolver), new model variants from these three providers will be auto-grouped in .sf/model-performance.json without hand-editing CANONICAL_BY_ROUTE. Live counts after fresh discovery will reveal experimental models absent from static catalog (e.g. opencode's "big-pickle", opencode-go's deepseek-v4-pro, mimo-v2.5-pro, hy3-preview). The model-router tolerates unconventional wire IDs — no naming constraints. To populate cache: rm -rf ~/.sf/runtime/model-catalog/ + relaunch sf. Tests: 13 new in provider-catalog-discovery.test.mjs (catalog shape, modelsPath presence, DISCOVERABLE_PROVIDER_IDS inclusion). Full suite 183 files / 1940 tests pass, zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:19:08 +02:00
Mikael Hugo	db3525b933	chore(model-registry): prune 15 redundant identity-strip aliases Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details After `385e0b448` added the dynamic discovery-cache resolver to canonicalIdFor, the 15 identity-strip aliases added in `089bf0cbe` for discovered providers became pure redundancy — the dynamic path returns the same bare modelId from the discovery cache. Removed (all canonical == bare modelId, all providers in discovery cache): - minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed - mistral/codestral-latest, mistral/devstral-2512, mistral/devstral-small-2507, mistral/mistral-large-latest, mistral/mistral-medium-latest, mistral/mistral-small-latest - zai/glm-4.5, zai/glm-4.5-air, zai/glm-4.6, zai/glm-4.7, zai/glm-5, zai/glm-5-turbo, zai/glm-5.1 Kept (real aliases — canonical differs from wire id, NOT identity strips): - kimi-coding/kimi-for-coding → kimi-k2.6 (Moonshot alias) - mistral/devstral-medium-2507 → devstral-medium-latest (alias to latest) - minimax/MiniMax-M2 family lowercase mappings (case-change aliases) Also kept: - zai/glm-4.5-flash, zai/glm-4.7-flash (not yet in discovery cache; flash variants may launch before cache refresh — fast-path safety) - kimi-coding/kimi-k2.6 + kimi-k2-thinking (kimi-coding cache only has kimi-for-coding; these resolve via _ENTRY_BY_ROUTE fallback) Tests: 15 new regression tests in canonical-id-dynamic.test.mjs verify each removed entry STILL resolves correctly via dynamic discovery. Total 21/21 in that file, plus 101 model-registry tests, plus 16 canonical-id-mapping tests — all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:17:06 +02:00
Mikael Hugo	385e0b4480	feat(model-learner): canonicalIdFor consults discovery cache as fallback Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details After commit `089bf0cbe` added 23 hand-written aliases for production route keys, the right structural fix is to also consult the dynamic model-discovery cache (~/.sf/agent/discovery-cache.json). Otherwise every new model variant from a discovered provider (ollama-cloud +39 models, openrouter +24, etc.) requires another round of hand-editing. canonicalIdFor now resolves in this order: 1. CANONICAL_BY_ROUTE (static fast path, retains real aliases like kimi-coding/kimi-for-coding → kimi-k2.6 where canonical differs) 2. _ENTRY_BY_ROUTE (existing static path) 3. canonicalIdFromDiscovery — reads ~/.sf/agent/discovery-cache.json, finds (provider, modelId) pair, returns bare modelId In-memory cache with 60s TTL (DISCOVERY_CACHE_TTL_MS) so the readFileSync on the hot path becomes one disk read per minute at most. canonicalIdFor is per-dispatch, not per-token, so the overhead is negligible. Test hook __setDiscoveryCacheForTest lets vitest inject a cache without touching the fs. Tests: 6 new in canonical-id-dynamic.test.mjs (dynamic hit, static-alias wins over dynamic, cache miss → null, null cache graceful, missing-models graceful, multiple models per provider). Combined with existing canonical-id-mapping: 22/22 pass. Full suite 1912 pass, no regressions. Sanity verified: canonicalIdFor("ollama-cloud/glm-5.1") → "glm-5.1" (dynamic-only, not in static table); canonicalIdFor("unknown/never") → null. Follow-up (in flight, separate agent): prune the static identity-strip aliases from CANONICAL_BY_ROUTE for providers in the discovery cache since they're now redundant with the dynamic resolver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:14:04 +02:00
Mikael Hugo	2a58f4ebec	feat(model-routing): autonomous fallback strict to enabledModels allowlist Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Autonomous mode's model-fallback chain bypassed enabledModels — when zai 429'd, the chain happily fell through to mistral/codestral-latest even though only minimax/, kimi-coding/, zai/, ollama-cloud/ were allowed. Of 52 dispatches in this repo's journal this session, 10 (~19%) escaped the allowlist (mistral×2, opencode-go×3, google-gemini-cli×5). enabledModels was honored by interactive cycling (settings-manager.ts) and by self-feedback-drain.js for triage routing, but auto-model-selection.js's fallback chain in selectAndApplyModel never read it. Now: isModelInEnabledList(provider, modelId, enabledModels) filters each fallback candidate. Supports exact "provider/model" or "provider/*" wildcard. Empty/undefined list = open behavior (no regression for setups without an allowlist). readEnabledModels reads ~/.sf/agent/settings.json once per chain; swallows IO errors → undefined → no constraint (safe failure mode). Escape hatch: SF_BYPASS_ENABLED_MODELS=1 disables the check for emergency / misconfigured cases. When ALL candidates are filtered out and the chain exhausts, throws a clear error directing the operator to add to allowlist or unset. Tests: 13 in enabled-models-fallback.test.mjs covering pattern matrix, multi-candidate chain skipping, bypass env, and exhaustion path. Full suite 1906 pass, no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:02:58 +02:00
Mikael Hugo	089bf0cbeb	fix(model-learner): resolve canonical-id lazy-load race + 23 wire-id aliases Of 52 dispatches in this repo's journal this session, 51 landed in .sf/model-performance.json's _unmapped bucket — meaning the live-outcome learner couldn't tell which provider/model succeeded or failed. Only 1 dispatch (google-gemini-cli/gemini-3-flash-preview) bucketed correctly. Root cause was NOT just missing aliases — it was a lazy-load race: - model-learner.js declared canonicalIdFor as a fire-and-forget dynamic import side-effect at module bottom - metrics.js called recordOutcome() synchronously after `await import("./model-learner.js")` resolved — before the registry injection promise settled - Result: _canonicalIdForFn was null for the first dispatch every session. Every session. Since the file shipped. Why nobody noticed: _unmapped is a bucket, not an error. No throw, no warning, no UI surface. Selection still worked because benchmark-selector + static hand-tuned scores carry the routing decision. Only the feedback loop (recordOutcome → adjust scores) was silently severed. Fix: - model-learner.js: export `registryReady` promise instead of swallowing it - metrics.js: await registryReady before recordOutcome() - model-registry.ts: 23 new CANONICAL_BY_ROUTE entries covering the actual production fallback chain — zai/glm-4.5{-air,-flash,5,5.1,5-turbo,4.6,4.7,4.7-flash}, mistral/codestral-latest + devstral-2512 + devstral-{small,medium}-* + mistral-{large,medium,small}-latest, google-gemini-cli/gemini-{2.5-pro,3-flash-preview,3.1-pro-preview}, opencode-go/{glm-5,glm-5.1,mimo-v2-omni,mimo-v2-pro} Also adds opt-in backfillModelPerformanceFromJournal(basePath) to reclassify the existing 51 _unmapped records from past journal events. Never auto-runs; backs up the old file before overwriting. Tests: 16 in canonical-id-mapping.test.mjs covering pattern matching, non-mappable cases, bare canonical-id passthrough, and the backfill path. Full suite 1906 pass, no regressions. Known follow-up: CANONICAL_BY_ROUTE uses mixed casing (MiniMax-M2.7 vs minimax-m2) — should be standardized lowercase in a future pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:02:58 +02:00
Mikael Hugo	5f92320c7d	fix(auto): timeout silent swarm turns despite heartbeats	2026-05-15 13:55:04 +02:00
Mikael Hugo	85f6650852	fix(auto): keep solver checkpoint pass out of swarm	2026-05-15 13:35:20 +02:00

1 2 3 4 5 ...

4565 commits