singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	fa657f2523	feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2) Restoration of doctrine: plan-milestone now emits a literal milestone.vision clause per slice (traces_vision_fragment) so validate-milestone has structured grounds for assessment instead of re-reading the vision through the LLM every time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and plan_milestone tool start requiring it for new slices, rejecting fragments that do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:48:50 +02:00
Mikael Hugo	3b83f0898b	merge(P4): purpose-coherence-gate before every dispatch (ADR-0000)	2026-05-15 18:45:17 +02:00
Mikael Hugo	5e2c7a7166	merge(P1): vision quality gate on sf new-milestone (ADR-0000)	2026-05-15 18:45:08 +02:00
Mikael Hugo	59fbaf4b0f	test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers Cleans up doctor-purpose-gate.test.mjs: - Uses insertMilestone/insertSlice helpers instead of raw SQL - Removes redundant test from doctor-plan-dir-normalization.test.mjs - Adds module-level JSDoc purpose comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:44:09 +02:00
Mikael Hugo	a303b5db29	feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000) Restores the eight-PDD purpose gate at the autonomous-loop boundary required by ADR-0000 (SF is a purpose-to-software compiler). The gate walks milestone vision -> slice.traces_vision_fragment -> task.purpose_trace before every dispatch and refuses to proceed when the purpose chain is broken at the vision root (degraded-vision). - New uok/purpose-coherence.js with a pure verdict function and a DB-backed adapter. Reads vision/trace columns directly via SQL so pre-P2/P3 schema migrations are tolerated. - Wired into auto/phases-pre-dispatch.js alongside resource-version- guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on every pre-dispatch turn and emits to the existing trace JSONL. - Outcome ladder: fail (vision missing -> pause loop), warn (trace columns missing or NULL -> surface but allow dispatch so legacy DBs don't hard-break on day one), pass (full chain). - Tests in tests/uok-purpose-coherence.test.mjs cover the four contracted states plus the column-missing downgrade path on a pre-migration schema. Refs: ADR-0000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:42:55 +02:00
Mikael Hugo	fb68b12902	feat(doctor): enforce ADR-0000 purpose gate — milestones need vision, slices need goal Adds two new doctor checks to checkEngineHealth(): - db_milestone_missing_vision: error when a milestone has no vision (the WHY/purpose field per ADR-0000) - db_slice_missing_goal: error when a slice has no goal (the WHAT/purpose field per ADR-0000) Both checks are non-fixable (the operator must define purpose). This aligns with ADR-0000 §Enforcement: "Non-trivial milestones, slices, tasks, ADRs, specs, tests, and exported symbols must name their purpose and consumer." Tests: 2 cases — milestone without vision flagged, slice without goal flagged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:42:14 +02:00
Mikael Hugo	aa0d57371e	feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone Restoration of forgotten doctrine: ADR-0000 declares the eight PDD fields (Purpose, Consumer, Contract, Failure boundary, Evidence, Non-goals, Invariants, Assumptions) the purpose gate, but `sf headless new-milestone --context <file>` was accepting any context including empty or trivially-thin seed docs. This wires a pre-create check that refuses the run when fields are missing or too thin, naming exactly which ones so the operator can fix the seed doc and retry. - new src/resources/extensions/sf/headless-pdd-check.js: scans context for the eight fields (heading and inline-label forms) and reports missing/sparse, plus a minimum-spine check (Purpose + Consumer + Contract + Evidence-or-Falsifier). - src/headless.ts calls the check after loadContext, before bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text. - --skip-pdd-check is the migration escape hatch (warning printed, PDD gate bypassed) for milestones that pre-date the gate. - SF-internal auto-bootstrap (autonomous→new-milestone fallback) is exempted because the seed is SF-generated, not operator-PDD. - vitest test covers missing-Purpose, missing-Consumer, all-8, sparse, inline-label form, Falsifier-as-Evidence spine, and the doctrine field order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:37:06 +02:00
Mikael Hugo	af1401e4ea	fix(solver): enforce PDD purpose gate	2026-05-15 18:32:59 +02:00
Mikael Hugo	bb0c87fdac	feat(remediation-dispatcher): M003 S04 — autonomous recovery from validation findings Implements RemediationDispatcher that classifies verification failures and maps them to recovery strategies: - transient → retry (timeout, flaky test, network) - structural → replan (broken import, syntax error) - knowledge → research (not implemented, missing context) - infra → escalate via self-feedback (tooling broken) Confidence scoring: - Single failing check + known pattern = high confidence - Multiple failures or high retry count = lower confidence - Configurable autoFixThreshold (default 0.6) 15 unit tests covering all 4 failure classes + confidence scoring + threshold behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:29:45 +02:00
Mikael Hugo	a863672463	fix(state): honor completed owning requirements	2026-05-15 18:24:16 +02:00
Mikael Hugo	b08cb13c20	fix(state): requirements-complete short-circuits the planning ladder Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05, PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but the milestone row was still active because its only slice was a post-migration skipped placeholder. After the previous fix routed all-skipped milestones to pre-planning, SF ran roadmap-meeting + plan-milestone and wrote 3 new slices on a milestone whose contract-level work was already done — burned ~4 LLM turns on plausibly-adjacent but unwanted re-decomposition. Root cause: deriveStateFromDb's milestone-completion gate consults only slice statuses (and indirectly the milestone row's own status field). It never reads REQUIREMENTS.md to check whether the contract is already satisfied. The slice-based view collapsed the real signal. Fix: - New parseRequirementsByMilestone(content) helper in files.js: parses REQUIREMENTS.md, groups entries by their `Primary owning milestone` field, returns Map<id, {complete, incomplete}>. - handleAllSlicesDone now reads REQUIREMENTS.md before its slice-based real-work check. If a milestone has at least one owning requirement and zero of them are incomplete, route to completing-milestone with nextAction naming the requirement count (so the operator can see why the milestone is being closed without manually opening REQUIREMENTS.md). - Best-effort: REQUIREMENTS.md parse failure falls through to the existing slice-based rule. Missing file likewise — no regression for projects that don't keep a requirements file. Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI. End-to-end verified by re-running sf headless query on dr-repo M003: now reports phase=completing-milestone with the right requirement-count message. Tests: 5 new cases — all complete + slice skipped → completing, some active → pre-planning, zero owning requirements falls through, missing file falls through, all complete + real slice work still completes. Existing 4 all-skipped-replan cases still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:22:28 +02:00
Mikael Hugo	6a2c61d5ee	feat(halt-self-feedback): M003 S03 — HaltWatchdog self-feedback integration T01: Added integration test auto-halt-self-feedback.test.mjs that proves: - HaltWatchdog.check() creates a self-feedback DB entry with kind=runaway-loop:idle-halt, severity=high, blocking=true - Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated - Deduplication works (one entry per idle period) - New heartbeat resets and creates a new entry for the next idle period T02: Enhanced evidence string to include elapsedMs, iteration, and thresholdMs explicitly (R003 actionable context requirement). Tests: 36/36 pass across auto-halt-self-feedback, auto-halt-watchdog-notify, and self-feedback-db suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:19:39 +02:00
Mikael Hugo	5cd5e14160	feat(headless): surface memory auth pause	2026-05-15 18:16:08 +02:00
Mikael Hugo	90b8e7edf8	feat(headless): expose memory extraction diagnostics	2026-05-15 18:13:35 +02:00
Mikael Hugo	f00762ffdb	fix(feedback): allow restamping suspect resolutions	2026-05-15 18:10:14 +02:00
Mikael Hugo	820f9aaf8e	fix(memory): classify extraction failures	2026-05-15 18:06:33 +02:00
Mikael Hugo	7dfb5099c9	fix(state): all-skipped milestone routes to pre-planning, not validating handleAllSlicesDone treated isStatusDone uniformly — "complete", "done", AND "skipped" all counted as "milestone work is finished", so a milestone whose only slice was skipped would advance to phase=validating-milestone. That's wrong: a placeholder slice that was skipped doesn't validate the milestone's success criteria, it just clears the wedge. Surfaced concretely in dr-repo M003 (Unified Dashboard + Pilot Validation): I skipped the migration placeholder via the new `sf headless skip-slice` CLI, and the next-dispatch reported `validate-milestone M003` even though no real work had happened on the milestone. The autonomous loop would then burn an LLM turn running validate-milestone just to discover the obvious gap. Fix: differentiate {complete, done} from {skipped} at the gate. When zero slices carry real-work outcomes, route into the pre-planning phase so the dispatcher's existing discuss → research → plan ladder takes over. The PDD/vision is already in the milestone row, so the planner has the purpose it needs without operator hand-holding. Verified end-to-end against dr-repo: `sf headless query` for M003 now reports phase=pre-planning and next dispatch `roadmap-meeting M003` (the deep-planning entry rule fires first; discuss/research/plan come after as artifacts land). Tests: 4 cases — all-skipped → pre-planning, complete+skipped mix → validating, legacy "done" alias → validating, multiple skipped → pre-planning. Resolves sf-mp73sk0m-63w88y (filed via headless feedback CLI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:00:21 +02:00
Mikael Hugo	881fd5e304	feat(memory,state): runtime counters for memory injection + milestone work validation Memory injection telemetry: - Move counter writes from auto-prompts.js to memory-store.js (where getRelevantMemoriesRanked/getActiveMemoriesRanked actually fire). - Track memory_inject_count and memory_inject_chars_total via runtime_counters table for headless-query reporting. State-db validation: - handleAllSlicesDone now checks if any slice carries real work (status=complete/done) before routing to validation. - Milestones with all-skipped slices route to "reassess-roadmap" instead of asking the operator to validate non-existent work. SM client defense: - Filter foreign-tenant memories from SM query responses even when the server returns them (defense-in-depth). Tests updated: memory-extraction-lifecycle, sf-db-migration, headless-query-memory-injection, sm-client, memory-tenant-gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 17:57:45 +02:00
Mikael Hugo	9abbfaada2	feat(memory-tenant-gate): add project-scoped isolation for SM cross-project recall Closes sf-mp723nju-2cpeoc. When SM_ENABLED is on, memory retrieval from Singularity Memory is now scoped to the current project's repoIdentity tenant. Foreign-tenant memories are filtered client-side and the tenant filter is sent server-side for SM servers that support it. Key changes: - schema v68: ADD COLUMN tenant TEXT on memories table (NULL = legacy) - insertMemoryRow: persists tenant field on every new record - backfillMemoryTenants / backfillMemoryTenantRows: idempotent migration called on session_start when SM_ENABLED is set - querySmMemories: resolves effectiveTenantId (opts.tenant > opts.tenantId > SM_TENANT_ID); returns [] when no tenant resolved and crossTenant off - SM_CROSS_TENANT_ENABLED=1 opt-in bypass with audit warning in console - register-hooks session_start: calls backfillMemoryTenants when SM active - 12 new tests in memory-tenant-gate.test.mjs; updated sm-client.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 17:55:26 +02:00
Mikael Hugo	ff333ae067	feat(memory): surface injection token cost in headless query The Project Memories section is rendered into every execute-task, plan-slice, and research-slice prompt. At 10 memories × ~200 chars each that's ~2K chars/turn injected into the context — real cost, no operator-visible meter. Adds two runtime_counters (already-existing key/value store): memory_inject_chars_total — cumulative section size memory_inject_count — number of injections Written by buildProjectMemoriesSection() on every render. Both writes sit inside a try/catch so a legacy DB without runtime_counters silently skips rather than blocking prompt build. `sf headless query` surfaces the cumulative + derived metrics as a new top-level `memoryInjection` block: { total_chars: 12480, count: 8, avg_chars: 1560, estimated_total_tokens: 3120 } The block is omitted entirely when count is 0 (fresh project / no prompts rendered yet) so it doesn't clutter the snapshot. Operators can now correlate prompt size growth against autonomous run cost without instrumenting the LLM call sites directly. The estimated_total_tokens is chars/4 — a rough approximation since SF doesn't tokenise the section, intentionally documented as such. Resolves sf-mp723yl9-rcxoeh filed via the headless feedback CLI. Tests: 5 source-level invariants — type carries the section, query reads counters by name, snapshot omits section on zero, write side calls both counter functions, write is wrapped in try/catch with documented failure-mode comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:55:14 +02:00
Mikael Hugo	671b2c8628	feat(sm-client): defense-in-depth tenant filter on SM responses Even though querySmMemories pins tenantId in the request body sent to the Singularity Memory server, SF used to accept whatever came back without verifying. A misconfigured or compromised SM server could echo memories from other tenants and SF would inject them into the next execute-task prompt — cross-customer leak. filterSmMemoriesToTenant() now re-checks every returned memory: - same-tenant memories pass through - foreign-tenant memories (memory.tenantId OR memory.tenant != expectedTenantId) are dropped, with a one-line warning so the misconfigured-SM symptom is visible rather than silent - memories with no tenant claim at all default to allow — matches the local DB's "NULL tenant = legacy row" rule from schema v68 - SM_REQUIRE_TENANT_CLAIM=true flips the legacy rule to drop (hard fail-closed mode for operators who want it) Defensive guards against non-array inputs, missing expectedTenantId (returns input unchanged so caller-side fail-open semantics are preserved), and the dual tenantId/tenant field naming. Tests: 8 cases — same-tenant pass-through, foreign drop, legacy allow, strict mode drop, tenantId/tenant alias, empty/non-array defensiveness, missing-expected pass-through, warning emission. Resolves the cross-project tenant-leak feedback row filed via the new headless feedback CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:49:24 +02:00
Mikael Hugo	7c3d9bd3bf	fix(memory): gate SM recall by tenant scope	2026-05-15 17:46:51 +02:00
Mikael Hugo	0c7aaafa00	feat(memory): enrich execute-task memory retrieval query Previously buildProjectMemoriesSection(`${sTitle} ${tTitle}`) sent two short strings to the cosine ranker — too sparse for re-ranking to do meaningful work against the static pool. buildMemoryRetrievalQuery() (new, exported for tests) enriches the query with: - slice.title + task.title (original signal) - slice.goal text, front 600 chars (the WHY of the slice — usually names the memory-relevant context the title can't fit) - top 20 changed files from git diff/status (the WHAT — what code is in play right now; lets cosine ranking promote memories whose content references those paths) Fail-open at each source: DB closed → no goal; not a git repo → no files; nullish title args don't poison the string. The call site never has to handle errors. Bounded so embedding token cost stays predictable: 600-char goal cap, 20-file cap. Empty inputs collapse to "" so the consumer's `if (!query.trim())` branch still picks the static fallback. Tests: 5 cases — titles always present, non-git directory safe, empty-input collapse, nullish-arg defensiveness, real git repo surfaces changed file paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 17:36:15 +02:00
Mikael Hugo	362af3d6a4	fix(headless): bypass rpc for status Some checks failed CI / detect-changes (push) Has been cancelled Details CI / docs-check (push) Has been cancelled Details CI / lint (push) Has been cancelled Details CI / build (push) Has been cancelled Details CI / integration-tests (push) Has been cancelled Details CI / windows-portability (push) Has been cancelled Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled Details CI / rtk-portability (macos, macos-15) (push) Has been cancelled Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled Details	2026-05-15 17:32:21 +02:00
Mikael Hugo	cf32e79578	feat(memory-embeddings): read SF_LLM_GATEWAY_KEY from env as auth.json fallback Enables CI and containerised deployments without writing secrets to disk. Auth.json still takes precedence when present. - readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var - SF_LLM_GATEWAY_URL env var also supported for endpoint override - Added tests for env fallback, auth.json preference, and default URL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 17:13:40 +02:00
Mikael Hugo	6214f7c86d	feat(memory): add extraction diagnostics	2026-05-15 16:53:01 +02:00
Mikael Hugo	fdc4650016	feat(self-feedback-drain): filter free opencode models from triage routing Self-feedback triage routing was including paid opencode models even when the operator policy prefers the free tier. Add isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the candidate list before the router scores them. Also: cosmetic — quote style normalised by the formatter on buildInlineFixPrompt strings and spawn options object. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:24 +02:00
Mikael Hugo	3a14fe86a7	test(list-models): isolate from developer's discovery-cache Tests were picking up the developer's real ~/.sf/agent/discovery-cache.json and seeing unexpected models in output. Pin tests to a guaranteed-missing path via the new _discoveryCacheFilePath option so the env they observe is solely what the test constructs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:11 +02:00
Mikael Hugo	d8f56e6704	feat(cli): add sf key subcommand for auth.json management Surgical read/write access to ~/.sf/agent/auth.json without touching the file directly. All mutations go through AuthStorage so file-lock and chmod-600 invariants are always respected. sf key set <provider> <api-key> add/rotate stored key sf key get <provider> show masked key (last 4 chars) sf key remove <provider> [--yes] remove credential sf key list list all providers + status Rationale: SF's source of truth for credentials is auth.json at runtime — env vars are only used during initial one-time provider setup. Rotation needs an explicit, audit-friendly path, not implicit env-driven re-reads. Keys are never echoed in full (last 4 chars only); remove always prompts unless --yes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 16:37:04 +02:00
Mikael Hugo	351bfad41d	fix(memory): extractTranscriptFromActivity now reads custom_message entries Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"` for assistant reasoning content. The old code only checked `role === "assistant"`, so every transcript was empty → extraction silently skipped every unit. Fix: recognise both legacy (`role === "assistant"`) and modern (`custom_message` with `sf-*` prefix) entry shapes. Also reads the standalone `text` field used by custom messages. This is why memory_processed_units had 0 rows despite 34 activity logs. Tests: 186 files / 1994 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 16:13:26 +02:00
Mikael Hugo	7ba469cff1	feat(memory): add debug logging to memory extraction pipeline Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The memory extraction system has infrastructure (DB tables, LLM prompts, unit closeout wiring, embedding backfill) but zero processed units and only self-feedback-resolution memories. This suggests extraction is failing silently. Add debugLog() calls throughout extractMemoriesFromUnit() so we can observe: - Skip reasons (mutex busy, rate limited, already processed, file too small) - Start/done lifecycle per unit - LLM call and parse outcomes - Error messages on failure and retry This makes the extraction pipeline observable via --debug or the journal/debug log without changing behavior. Tests: 185 files / 1993 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 16:09:36 +02:00
Mikael Hugo	ba4b2d46d9	sf snapshot: uncommitted changes after 43m inactivity	2026-05-15 15:53:19 +02:00
Mikael Hugo	0b19afebf6	test(providers): expand discovery test matrix to 46 cases Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Adds full coverage for the discovery-gating root cause that was fixed in commits `d70d8d3b1` (xiaomi x-api-key auth) and the subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry work in model-catalog-cache.js. Diagnosis recap: kimi-coding, opencode, opencode-go were silent in ~/.sf/agent/discovery-cache.json because the SDK's model-discovery.js adapter registry marked them with StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's discoverModels() never attempted them. SF's own scheduleModelCatalogRefresh DID fetch them but wrote only to the per-repo runtime cache (basePath/.sf/model-catalog/) and only fired on session_start — not during --discover. The fix is to mirror the write to the SDK's discovery cache on both fetch-path AND cache-hit path, and await it in cli.ts before listModels when --discover is set. New test sections: - parseDiscoveredModels: OpenAI {data}/{models} formats, Google {models[].name} prefix stripping, name-as-id fallback, null on bad input, OpenRouter pricing extraction - refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer), opencode uses Bearer, no-key providers skipped, SDK discovery cache written on BOTH network-fetch and cache-hit paths, kimi-coding + opencode-go iterated when keys present 46 tests pass. No regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:09:38 +02:00
Mikael Hugo	67c088410c	chore(discovery): silence debug stderr from refresh path Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Trailing instrumentation from the discovery investigation. The error catch still swallows non-fatal failures during --discover, just no longer prints to stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 15:03:56 +02:00
Mikael Hugo	fe28a48d81	fix(sift): revert to bm25,phrase for repo-root — hang was corrupted cache Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The earlier commit (`44fcfb643`) incorrectly disabled phrase on repo-root because I thought phrase retriever hung on full-workspace scope. After clearing the corrupted cache (left by killing a mid-build vector process), testing confirms: - bm25 alone on repo root: works, 1m 50s cold, instant warm - phrase alone on repo root: works after cache clear - bm25+phrase on repo root: works after cache clear - vector on scoped paths: works after cache build The "hang" was from a corrupted/stale cache, not a sift bug. .siftignore is properly excluding files (146K→2,660 indexed). Revert chooseSiftRetrievers back to bm25,phrase for repo-root. Tests: 184 files / 1974 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 14:59:45 +02:00
Mikael Hugo	b88b66c651	feat(auto): fan out swarm research units	2026-05-15 14:54:27 +02:00
Mikael Hugo	c8854ca896	feat(discovery): cache stores pricing — unblocks zero-cost-but-not-:free models Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Today's discovery cache stored only model IDs (string[]). Downstream isZeroCost(model?.cost) check evaluated against undefined for any dynamically-discovered model, so OpenRouter's zero-cost-but-not-:free entries (owl-alpha, lyria-3-pro-preview, lyria-3-clip-preview, openrouter/free) got silently blocked by the built-in provider policy. Cache entry shape now: {id, cost?, contextWindow?} per model. parseDiscoveredModels extracts pricing from OpenRouter's /api/v1/models response (pricing.prompt/completion/input_cache_read/ input_cache_write → numeric cost.{input,output,cacheRead,cacheWrite}). Other providers stay {id}-only — their /v1/models endpoints don't ship pricing. Migration: on first read of a legacy string[] cache, entries are converted in-place to {id} objects and the file is rewritten. No cost backfill (data wasn't there before), but the new readers handle them. Cost wired into policy: isModelAllowedByBuiltInProviderPolicy calls lookupDiscoveredModelCost("openrouter", modelId) as a fallback when the static model registry has no cost data. Plus: cli.ts --discover now eagerly refreshes SF-managed providers (opencode, opencode-go, kimi-coding, xiaomi) that the SDK's adapter doesn't cover — so they populate cache on first --discover instead of waiting for a session-start lazy refresh. Tests: 13 new across 5 groups (pricing extraction, round-trip, legacy migration, policy gate happy/sad paths, Google provider compat). Full suite: 184 files / 1971 tests, zero regressions. Real-world result: openrouter/owl-alpha, google/lyria-3-pro-preview, google/lyria-3-clip-preview, openrouter/free, plus any future zero-cost models now pass the policy filter on the next discovery refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:51:00 +02:00
Mikael Hugo	d70d8d3b10	fix(providers): use x-api-key for xiaomi discovery	2026-05-15 14:43:09 +02:00
Mikael Hugo	09ea553b6d	fix(auto): initialize notification store during bootstrap	2026-05-15 14:42:02 +02:00
Mikael Hugo	0a332f4cba	fix(headless): normalize auto alias to autonomous	2026-05-15 14:32:00 +02:00
Mikael Hugo	44fcfb643c	fix(sift): use bm25 only for repo-root — phrase retriever hangs on full scope Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Root cause: the sift binary's phrase retriever hangs indefinitely when queried against the full repo-root scope (57K+ files). Earlier tests mistook this for a general slowness, but isolated testing confirms: - bm25 alone on repo root: works (1m 30s cold, instant warm) - phrase alone on repo root: hangs forever - bm25+phrase on repo root: hangs forever (phrase path blocks) - all retrievers on scoped subdirs: work correctly The earlier Rust panic was from a corrupted cache state left by killing a mid-build vector process. After clearing the cache, bm25 alone works. Fix: chooseSiftRetrievers now returns retrievers: "bm25" (not "bm25,phrase") for repo-root scope. Scoped subdirs still get bm25+phrase+vector with position-aware reranking. Tests: updated 3 assertions in sift-retriever-scope.test.mjs. Full suite: 183 files / 1958 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 14:28:23 +02:00
Mikael Hugo	1b5348e28e	feat(providers): live discovery for opencode, opencode-go, minimax Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Three providers were missing from PROVIDER_CATALOG_CONFIG so their model lists couldn't be auto-discovered. Their wire ids only existed in packages/ai/src/models.generated.ts as hand-coded entries, meaning new model variants from these providers required manual catalog edits. Verified live endpoints respond to /v1/models with bearer auth: - opencode → https://opencode.ai/zen/v1/models (6 free models) - opencode-go → https://opencode.ai/zen/go/v1/models (15 models) - minimax → https://api.minimax.io/v1/models (works) Added entries: opencode: baseUrl https://opencode.ai/zen, modelsPath /v1/models opencode-go: baseUrl https://opencode.ai/zen/go, modelsPath /v1/models minimax: baseUrl https://api.minimax.io, modelsPath /v1/models (international endpoint; Chinese-network api.minimaxi.com still handled separately in the SDK) Auth keys already wired: OPENCODE_API_KEY, OPENCODE_GO_API_KEY (with OPENCODE_API_KEY fallback), MINIMAX_API_KEY. No env-api-keys.ts changes. Combined with `385e0b448` (dynamic canonicalIdFor resolver), new model variants from these three providers will be auto-grouped in .sf/model-performance.json without hand-editing CANONICAL_BY_ROUTE. Live counts after fresh discovery will reveal experimental models absent from static catalog (e.g. opencode's "big-pickle", opencode-go's deepseek-v4-pro, mimo-v2.5-pro, hy3-preview). The model-router tolerates unconventional wire IDs — no naming constraints. To populate cache: rm -rf ~/.sf/runtime/model-catalog/ + relaunch sf. Tests: 13 new in provider-catalog-discovery.test.mjs (catalog shape, modelsPath presence, DISCOVERABLE_PROVIDER_IDS inclusion). Full suite 183 files / 1940 tests pass, zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:19:08 +02:00
Mikael Hugo	db3525b933	chore(model-registry): prune 15 redundant identity-strip aliases Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details After `385e0b448` added the dynamic discovery-cache resolver to canonicalIdFor, the 15 identity-strip aliases added in `089bf0cbe` for discovered providers became pure redundancy — the dynamic path returns the same bare modelId from the discovery cache. Removed (all canonical == bare modelId, all providers in discovery cache): - minimax/MiniMax-M2.7, minimax/MiniMax-M2.7-highspeed - mistral/codestral-latest, mistral/devstral-2512, mistral/devstral-small-2507, mistral/mistral-large-latest, mistral/mistral-medium-latest, mistral/mistral-small-latest - zai/glm-4.5, zai/glm-4.5-air, zai/glm-4.6, zai/glm-4.7, zai/glm-5, zai/glm-5-turbo, zai/glm-5.1 Kept (real aliases — canonical differs from wire id, NOT identity strips): - kimi-coding/kimi-for-coding → kimi-k2.6 (Moonshot alias) - mistral/devstral-medium-2507 → devstral-medium-latest (alias to latest) - minimax/MiniMax-M2 family lowercase mappings (case-change aliases) Also kept: - zai/glm-4.5-flash, zai/glm-4.7-flash (not yet in discovery cache; flash variants may launch before cache refresh — fast-path safety) - kimi-coding/kimi-k2.6 + kimi-k2-thinking (kimi-coding cache only has kimi-for-coding; these resolve via _ENTRY_BY_ROUTE fallback) Tests: 15 new regression tests in canonical-id-dynamic.test.mjs verify each removed entry STILL resolves correctly via dynamic discovery. Total 21/21 in that file, plus 101 model-registry tests, plus 16 canonical-id-mapping tests — all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:17:06 +02:00
Mikael Hugo	385e0b4480	feat(model-learner): canonicalIdFor consults discovery cache as fallback Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details After commit `089bf0cbe` added 23 hand-written aliases for production route keys, the right structural fix is to also consult the dynamic model-discovery cache (~/.sf/agent/discovery-cache.json). Otherwise every new model variant from a discovered provider (ollama-cloud +39 models, openrouter +24, etc.) requires another round of hand-editing. canonicalIdFor now resolves in this order: 1. CANONICAL_BY_ROUTE (static fast path, retains real aliases like kimi-coding/kimi-for-coding → kimi-k2.6 where canonical differs) 2. _ENTRY_BY_ROUTE (existing static path) 3. canonicalIdFromDiscovery — reads ~/.sf/agent/discovery-cache.json, finds (provider, modelId) pair, returns bare modelId In-memory cache with 60s TTL (DISCOVERY_CACHE_TTL_MS) so the readFileSync on the hot path becomes one disk read per minute at most. canonicalIdFor is per-dispatch, not per-token, so the overhead is negligible. Test hook __setDiscoveryCacheForTest lets vitest inject a cache without touching the fs. Tests: 6 new in canonical-id-dynamic.test.mjs (dynamic hit, static-alias wins over dynamic, cache miss → null, null cache graceful, missing-models graceful, multiple models per provider). Combined with existing canonical-id-mapping: 22/22 pass. Full suite 1912 pass, no regressions. Sanity verified: canonicalIdFor("ollama-cloud/glm-5.1") → "glm-5.1" (dynamic-only, not in static table); canonicalIdFor("unknown/never") → null. Follow-up (in flight, separate agent): prune the static identity-strip aliases from CANONICAL_BY_ROUTE for providers in the discovery cache since they're now redundant with the dynamic resolver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:14:04 +02:00
Mikael Hugo	2a58f4ebec	feat(model-routing): autonomous fallback strict to enabledModels allowlist Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Autonomous mode's model-fallback chain bypassed enabledModels — when zai 429'd, the chain happily fell through to mistral/codestral-latest even though only minimax/, kimi-coding/, zai/, ollama-cloud/ were allowed. Of 52 dispatches in this repo's journal this session, 10 (~19%) escaped the allowlist (mistral×2, opencode-go×3, google-gemini-cli×5). enabledModels was honored by interactive cycling (settings-manager.ts) and by self-feedback-drain.js for triage routing, but auto-model-selection.js's fallback chain in selectAndApplyModel never read it. Now: isModelInEnabledList(provider, modelId, enabledModels) filters each fallback candidate. Supports exact "provider/model" or "provider/*" wildcard. Empty/undefined list = open behavior (no regression for setups without an allowlist). readEnabledModels reads ~/.sf/agent/settings.json once per chain; swallows IO errors → undefined → no constraint (safe failure mode). Escape hatch: SF_BYPASS_ENABLED_MODELS=1 disables the check for emergency / misconfigured cases. When ALL candidates are filtered out and the chain exhausts, throws a clear error directing the operator to add to allowlist or unset. Tests: 13 in enabled-models-fallback.test.mjs covering pattern matrix, multi-candidate chain skipping, bypass env, and exhaustion path. Full suite 1906 pass, no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:02:58 +02:00
Mikael Hugo	089bf0cbeb	fix(model-learner): resolve canonical-id lazy-load race + 23 wire-id aliases Of 52 dispatches in this repo's journal this session, 51 landed in .sf/model-performance.json's _unmapped bucket — meaning the live-outcome learner couldn't tell which provider/model succeeded or failed. Only 1 dispatch (google-gemini-cli/gemini-3-flash-preview) bucketed correctly. Root cause was NOT just missing aliases — it was a lazy-load race: - model-learner.js declared canonicalIdFor as a fire-and-forget dynamic import side-effect at module bottom - metrics.js called recordOutcome() synchronously after `await import("./model-learner.js")` resolved — before the registry injection promise settled - Result: _canonicalIdForFn was null for the first dispatch every session. Every session. Since the file shipped. Why nobody noticed: _unmapped is a bucket, not an error. No throw, no warning, no UI surface. Selection still worked because benchmark-selector + static hand-tuned scores carry the routing decision. Only the feedback loop (recordOutcome → adjust scores) was silently severed. Fix: - model-learner.js: export `registryReady` promise instead of swallowing it - metrics.js: await registryReady before recordOutcome() - model-registry.ts: 23 new CANONICAL_BY_ROUTE entries covering the actual production fallback chain — zai/glm-4.5{-air,-flash,5,5.1,5-turbo,4.6,4.7,4.7-flash}, mistral/codestral-latest + devstral-2512 + devstral-{small,medium}-* + mistral-{large,medium,small}-latest, google-gemini-cli/gemini-{2.5-pro,3-flash-preview,3.1-pro-preview}, opencode-go/{glm-5,glm-5.1,mimo-v2-omni,mimo-v2-pro} Also adds opt-in backfillModelPerformanceFromJournal(basePath) to reclassify the existing 51 _unmapped records from past journal events. Never auto-runs; backs up the old file before overwriting. Tests: 16 in canonical-id-mapping.test.mjs covering pattern matching, non-mappable cases, bare canonical-id passthrough, and the backfill path. Full suite 1906 pass, no regressions. Known follow-up: CANONICAL_BY_ROUTE uses mixed casing (MiniMax-M2.7 vs minimax-m2) — should be standardized lowercase in a future pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 14:02:58 +02:00
Mikael Hugo	5f92320c7d	fix(auto): timeout silent swarm turns despite heartbeats	2026-05-15 13:55:04 +02:00
Mikael Hugo	85f6650852	fix(auto): keep solver checkpoint pass out of swarm	2026-05-15 13:35:20 +02:00
Mikael Hugo	bd3fbda9cb	feat(journal): swarm-dispatch event per dispatch — cross-repo telemetry Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The swarm dispatch path is default in headless (`ea8a3d935`) but the journal didn't tag events with which dispatch path was used. Result: grep "swarm" .sf/journal/.jsonl returned zero hits across this repo, ~/code/dr-repo, ~/code/centralcloud/dr — even where swarm IS running. Cross-repo telemetry was blind to swarm adoption. Now both swarm dispatch sites emit a journal event per call: runUnitViaSwarm (auto/run-unit.js): - success: outcome from worker checkpoint or "continue", via "autonomous-unit" - no-reply: outcome "no-reply" with error field - throw: outcome "error" with error field runSingleAgentViaSwarm (subagent/index.js): - success: outcome "agent-reply", via "subagent-extension", agentName - no-reply / catch: same outcome scheme as run-unit Event shape: { ts, eventType: "swarm-dispatch", data: { unitType, unitId, targetAgent, workMode, toolCallCount, outcome, via, agentName?, error? } } All six emitJournalEvent calls wrapped in try/catch — journal write failure must not break dispatch (mirrors crash-recovery.js pattern). Tests: 68 new assertions across the two files (5 + 4 test groups covering happy path, no-reply, throw). Full suite 1872 pass, no regressions. Once landed everywhere this enables: - grep swarm-dispatch .sf/journal/.jsonl shows adoption - ~/.sf/agent/upstream-feedback.jsonl rolls up swarm vs legacy ratio - "is this repo using swarms?" becomes a one-line query Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 13:22:28 +02:00
Mikael Hugo	c42c13b882	feat(auto): trigger sift index warmup at start of every autonomous loop Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Previously, sift warmup only ran during sf init/auto-start, which meant repos launched via sf headless or entered mid-session never got their index built. The first sift_search/codebase_search call would then block for minutes while the cold cache was built. Now autoLoop() calls ensureSiftIndexWarmup() at loop entry. The warmup runs detached (background process) and is skipped if already running or if a recent marker exists. This ensures every repo SF operates on gets indexed regardless of entry path. - Best-effort: wrapped in try/catch so warmup failures never block the loop - Lazy import to avoid circular dependencies - Debug-logged for observability Tests: 179 files / 1863 tests pass. Type check: clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 13:17:44 +02:00

1 2 3 4 5 ...

4563 commits