singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	061985b226	fix(sf): runaway guard treats token count as secondary signal Token count now only triggers a warning when accompanied by a primary signal (high tool calls, long elapsed time, or many changed files). This prevents false positives on units doing real work with large context models, where 25+ tool calls can legitimately burn 1M+ tokens. Also renames 'session tokens' to 'unit tokens' in guard messages to clarify that the metric is delta-from-unit-start, not cumulative. Fixes sf-moqewawp-ijwjjt	2026-05-04 01:51:33 +02:00
Mikael Hugo	f712c339b3	sf snapshot: pre-dispatch, uncommitted changes after 1497m inactivity	2026-05-04 01:22:39 +02:00
Mikael Hugo	6384c5b44c	test(sf): integration test — graph-boost lifts neighbor through full pipeline Pure-function tests for applyRelationBoost (`55b14c3f7`) cover the math, but the wired-through path (createMemoryRelation → boost picked up by getRelevantMemoriesRanked → reordered output) had no end-to-end test. New test: 1. Creates memories a, b, c with orthogonal embeddings 2. Mocks gateway to return a query vector aligned only with a 3. Wires a→b with related_to (confidence 1.0) 4. Asserts ranking: a (cosine top) > b (boost from a) > c (unrelated) Locks the contract that the boost actually fires through the full pipeline, not just the pure helper. 16 → 17 tests in the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:25:07 +02:00
Mikael Hugo	22109cee6a	docs(sf): escalation.ts header lists carry-forward + memory persistence The header listed "artifact I/O, detection, flag flips, resolution" but not the carry-forward injection (claimOverrideForInjection / formatOverrideBlock) or the memory persistence calls now embedded in both writeEscalationArtifact (continueWithDefault path, `b9bff3762` sibling) and resolveEscalation (`00c13bc5a`). These are load-bearing behaviors a contributor should know up front. Also folded the "SF's local ADR-011 is 'Swarm Chat'" disambiguation note into the header (matches the convention the rest of the disambiguation sweep set). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:22:54 +02:00
Mikael Hugo	ec4dab450b	docs(sf): clarify memory-sleeper.ts is NOT part of the memory pipeline memory-sleeper.ts had no file header and the "memory" prefix is misleading — it's a runtime tool-output watchdog (detects repeated bash failures, too-large tool results) that emits steers, completely unrelated to memory-store / memory-relations / memory-embeddings. A contributor reading directory listing top-down would reasonably assume this file participates in the same pipeline as the other memory-*.ts modules. Header now states the historical naming and points readers in the right direction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:21:06 +02:00
Mikael Hugo	e10511ce38	docs(sf): memory-embeddings.ts header reflects actual pipeline The previous header had two stale references: - "buildMemoryLLMCall pattern, prefers a dedicated embedding-capable model" — describes a hook that actually returns null on every call (the Pi SDK has no provider-neutral embedding API yet). - "queryMemoriesRanked falls back to keyword-only scoring" — function doesn't exist; the real consumer is getRelevantMemoriesRanked, and the fallback is static (confidence × hit_count), not keyword. Updated to describe the actual three-stage read pipeline (cosine → relation-boost → optional rerank) and the soft-degrade fallback to static ranking when the gateway is offline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:18:46 +02:00
Mikael Hugo	308958453d	docs(sf): memory-relations.ts header reflects actual writers + readers The file header described an aspirational design ("LINK actions emitted by the memory extractor, or future /sf memory link CLI") that never matched code reality. As of this session: Writers shipped: (a) applyMemoryActions auto-links co-extracted memories with related_to (`b9bff3762`) (b) /sf memory import loads explicit edges from JSON Read consumers shipped: (1) getRelevantMemoriesRanked graph-boost (`55b14c3f7`) (2) sf_graph MCP tool (pre-existing) Updated the header so a contributor reading top-down sees the current data flow, not the original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:17:03 +02:00
Mikael Hugo	b9bff37623	feat(sf): co-extracted memories get auto-linked with related_to Previous commit (`55b14c3f7`) wired memory_relations into ranking, but the table was empty — no writer added edges. applyMemoryActions now links memories created in the same batch pairwise with `related_to` edges (confidence 0.5 reflects "from same extraction context" being weaker evidence than an explicit human-authored relation). Pairwise O(n²) is fine for typical extractor batches of 1–5 memories. Combined with 55b14c3f7's relation-boost ranker, the effect is: extracting memories A, B, C from one slice transcript ⇒ when later a query hits A, B and C get a small score bump (and vice versa). The cohort surfaces together rather than fragmenting across categories. UPDATE / REINFORCE / SUPERSEDE actions don't trigger linkage — linkage is for new co-extracted context, not modifications of existing memories. Best-effort: relation creation failures don't roll back the memory batch. 14 → 16 tests in memory-store.test.ts; new tests verify the 3-memory batch yields C(3,2)=3 edges and a single-CREATE batch yields 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:13:21 +02:00
Mikael Hugo	55b14c3f78	feat(sf): wire memory_relations into ranking — graph-boost pass memory_relations was storage-only since `56ee89a94` / `23c5de38b`. Now getRelevantMemoriesRanked walks edges of cosine top-N memories and applies a one-pass score-boost to neighbors: combined += parent_score × edge_confidence × damping where damping=0.4 by default. Both endpoints of an edge get the boost symmetrically (memory A pulling B is equally evidence that B is relevant to A's context). Pure helper `applyRelationBoost(ranked, edges, options)` lives in memory-embeddings.ts so memory-store.ts doesn't take a direct dependency on memory-relations.ts; the call site composes the two modules. When memory_relations is empty (the case until a writer adds edges — a future agent or hook), applyRelationBoost returns the input unchanged → no behavior change today. Intra-pool only: cross-pool edges (where one endpoint is outside the 50–200 cosine pool) are skipped to avoid pulling in low-static memories on a hot edge alone. Pool expansion via relations would be a separate, more invasive feature. 4 new tests cover empty edges, empty ranked, cross-pool edge skip, and the canonical "low-but-related promoted above lone" case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:09:33 +02:00
Mikael Hugo	1da4d5fdf6	perf(sf): index memory_relations.to_id for reverse-edge lookups Audit of all FROM/INTO/UPDATE clauses in the codebase against CREATE TABLE statements found one missing index. memory_relations PK is (from_id, to_id, rel) — covers from_id as leading column. But memory-relations.ts:233 queries `WHERE to_id = :id` which would full-scan once the relation count grows. Added idx_memory_relations_to. Cheap insertion cost; avoids the worst-case query as soon as a ranker consumer starts traversing edges (the natural next-step from `23c5de38b`). Schema-gap audit (option 3 in the redirect): no other ghost-table references found. unit_claims has its own .sf/unit-claims.db and self-contained schema in unit-ownership.ts. active_decisions / active_requirements / active_memories are CREATE VIEW IF NOT EXISTS, properly created. "INTO worktree" was a JSDoc false positive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:05:05 +02:00
Mikael Hugo	72104aed1d	fix(sf): formatMemoriesForPrompt rank-preserving mode + use it in execute-task Real semantic bug: getRelevantMemoriesRanked returns memories in score-descending order (cosine + optional rerank), but formatMemoriesForPrompt then re-grouped them by CATEGORY_PRIORITY (gotcha=0 first, convention=1, ...). A high-relevance "convention" memory got buried under low-relevance "gotcha" entries purely because gotcha has higher category priority. The agent never saw the most relevant items at the top. formatMemoriesForPrompt gains a `preserveRankOrder` parameter (default false for backward compat). When true: - Renders bullets in input order - Tags each line with [category] so the agent can still tell gotchas from conventions Wired auto-prompts.ts execute-task injection: when memoryQuery is non-empty (i.e. query-aware ranker was used), pass true. Static-ranked input keeps the historical category-grouped layout. Tests verify both modes side-by-side using identical input — the ordering flip is the load-bearing assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:02:59 +02:00
Mikael Hugo	a3698b4e6c	docs(sf): file-header comment for /sf escalate also mentions --all Same disambiguation as `45b669ac3` but for the source-file header comment (a contributor reading commands-escalate.ts top-down sees the same surface as `/sf escalate help`). Comment-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:56:36 +02:00
Mikael Hugo	45b669ac32	docs(sf): /sf escalate help mentions --all flag Commit `0f0aee5bf` added the --all flag to /sf escalate list (showing resolved entries in addition to active ones), but the usage() text never advertised it. Operators discovered the flag only by reading source. Adding it to the help line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:55:18 +02:00
Mikael Hugo	0426e61cea	fix(sf): getRelevantMemoriesRanked pool size never less than limit Pool was Math.min(50, limit * 5). For default limit=10 this gives 50 (intended 5× oversample for rerank). But for limit=100 it gives 50 — caller asking for 100 results would silently get at most 50. Now: max(limit, limit * 5), capped at 200 to bound rerank latency on huge requests. Default behavior unchanged for limit ≤ 10; large requests now work correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:49:18 +02:00
Mikael Hugo	e2e708fc11	test(sf): lock continueWithDefault memory persistence contract Two new tests covering the symmetric write shipped in `7a5b12540`: 1. writeEscalationArtifact with continueWithDefault=true → memory created with "[escalation:T##]" prefix, "auto-applied default:" rationale marker, and Fail option label (the recommendation). 2. writeEscalationArtifact with continueWithDefault=false → NO memory at write time (pending entries defer persistence to resolveEscalation per existing behavior). Together with the resolve-time tests in `3b5e6588e`, all three escalation flows (resolved, auto-accepted, default-applied) have locked memory-persistence contracts. 23 → 25 tests in the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:47:08 +02:00
Mikael Hugo	7a5b125405	feat(sf): persist continueWithDefault escalations as memories too When an agent escalates with continueWithDefault=true, it has already proceeded with the recommendation — the artifact JSON captures the audit trail but no other surface carries the rationale forward. Downstream tasks running after this one would query memories and find nothing about the choice. resolveEscalation already writes a memory on the continueWithDefault= false path (after operator resolves). This is the symmetric write for the continueWithDefault=true path: same category="architecture", same "[escalation:T##]" prefix, with the rationale prefixed "auto-applied default: ..." so a journal scan can tell apart continueWithDefault entries from operator-resolved ones. Now a slice's full decision history (operator-resolved + auto-accepted + default-applied escalations) lives uniformly in the memory store and flows into the cosine ranking for downstream prompts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:46:07 +02:00
Mikael Hugo	fec6c293bf	docs(sf): align agent escalation guidance with already-resolved reality The execute-task escalation guidance claimed the user "can review or override later via /sf escalate". Commit `c1ce9aac1` already made the already-resolved message explicit that auto-accepted decisions can't be retroactively undone — the carry-forward into downstream tasks happens before any operator could intervene. Updated the agent-facing guidance to match: auto-mode accepts + persists as memory + carries forward; the operator gets the audit trail via /sf escalate list --all but the executed work stands. This shifts the agent's incentive toward thorough rationale capture (since that's what survives) rather than the false comfort of "the user can fix it later". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:43:18 +02:00
Mikael Hugo	5cc2522646	feat(sf): /sf memory search header reports rerank state too After `aa60821ec` wired the rerank pass, the search header still said "(embedding-ranked)" even when SF_LLM_GATEWAY_RERANK_MODEL was set and the worker was online. The user couldn't tell whether they were seeing cosine-only or rerank-enhanced results. Now the header has three states: - "(embedding+rerank-ranked)" — both env vars set - "(embedding-ranked)" — only SF_LLM_GATEWAY_KEY set - "(static rank — set SF_LLM_GATEWAY_KEY for embeddings)" — neither Header-only diff. The rerank can still soft-degrade silently if the worker is offline (caller throttles the warning to once/min) — header reports the configured state, not the realized state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:39:28 +02:00
Mikael Hugo	54f27bd02c	test(sf): lock embedding lifecycle hygiene contract Three new tests covering the embedding-cleanup paths shipped in `7bec2dc2d` / `1b71ddd17` / `05a326a29`: 1. updateMemoryContent → drops the existing memory_embeddings row (next backfill re-embeds the new content). 2. supersedeMemory → drops the superseded memory's embedding while preserving the live one's. 3. enforceMemoryCap → sweeps embeddings of newly-superseded memories so memory_embeddings stays aligned with active memories after a batch cap. Without these, a regression in the cleanup paths would silently leave orphaned vectors that loadAllEmbeddings's superseded_by filter masks at query time but bloats the table forever. 11 → 14 tests in memory-store.test.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:35:15 +02:00
Mikael Hugo	3b5e6588e9	test(sf): lock escalation→memory persistence contract Commit `00c13bc5a` added "createMemory on resolveEscalation" but the behavior was untested — a regression that broke it would silently disable the cross-session learning surface (the [escalation:T##] memories are what carry agent rationales forward via getRelevantMemories ranking). Two new tests: 1. resolveEscalation with explicit user rationale → memory contains the question, choice, and user rationale, category=architecture. 2. resolveEscalation with empty rationale → falls back to the artifact's recommendationRationale (the formatEscalationMemoryContent contract). 23 tests in the file now (was 21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:33:18 +02:00
Mikael Hugo	c1ce9aac15	docs(sf): better message when /sf escalate resolve hits an already-resolved entry The "already-resolved" branch returned a bare timestamp with no guidance. Auto-accepted escalations especially leave the user wondering what to do — the carry-forward was already injected into the next task, so this command can't retroactively undo the choice. Now the message distinguishes auto-accepted vs user-resolved and, for the auto-accepted case, points to `/sf memory note "..."` as the forward-looking corrective surface (it lands in memory_embeddings on next backfill and influences future ranking). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:32:01 +02:00
Mikael Hugo	5fda99bfae	chore(sf): throttle rerank-unavailable warnings to once per minute When SF_LLM_GATEWAY_RERANK_MODEL is set but no rerank worker is online, every memory query (per execute-task prompt assembly) would log "[sf:memory-embeddings] WARN: llm-gateway /rerank unavailable (503)" — several lines per turn, all redundant. The soft-degrade is expected in this state. Now the message logs at most once per 60s. Symmetric with the runEmbeddingBackfill unavailable-throttle pattern. Both sad-path loggers stay informative (the operator sees one line and knows the worker is down) without drowning the journal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:27:57 +02:00
Mikael Hugo	0ee94f21be	chore(sf): drop chatty backfill success log runEmbeddingBackfill fires on every agent_end (per-turn). When the gateway is online and a project produces memories, every turn would write a "[sf:memory-embeddings] WARN: backfill: embedded N memories" line — successes labeled as warnings, repeating on every cycle. That both inflates the stderr stream and misleads grep-for-WARN diagnostics. Successes are routine; the function's return value carries the count when a caller cares. Failures still log (throttled to 60s) via the existing path. Net effect: the embedding pipeline runs silently in the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:25:35 +02:00
Mikael Hugo	05a326a294	fix(sf): enforceMemoryCap sweeps orphaned embeddings too Same orphan-cleanup as `1b71ddd17` but for the batch path. enforceMemoryCap calls supersedeLowestRankedMemories, which marks N lowest memories superseded in one UPDATE — bypassing the per-memory supersede embedding cleanup. The result was that capping a project at 50 memories left dead embedding rows for everything that got demoted. Now: a single DELETE-IN-SUBQUERY removes embedding rows for any memory that no longer has superseded_by IS NULL — covers both the cap path and any historical orphans from before the per-row cleanup landed. Best-effort; cap enforcement is load-bearing, embedding cleanup is not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:23:37 +02:00
Mikael Hugo	1b71ddd178	fix(sf): drop embedding row when memory is superseded supersedeMemory soft-deleted via superseded_by but left the memory_embeddings row in place. loadAllEmbeddings already filters by superseded_by IS NULL, so the orphaned row is harmless functionally — but it wastes storage, complicates manual SQL audits, and is inconsistent with updateMemoryContent (which already invalidates the embedding via `7bec2dc2d`). Best-effort delete; supersede still succeeds even if the embedding delete raises. Symmetric with the update path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:21:57 +02:00
Mikael Hugo	aa60821ec3	feat(sf): wire rerank pass into getRelevantMemoriesRanked The gateway rerank surface was shipped dormant in `56ee89a94` — the function existed but no consumer called it, so setting SF_LLM_GATEWAY_RERANK_MODEL did nothing functional. Now: after the cosine-rank top-K is computed, optionally call rerankCandidates(query, top-K) when a rerank model is configured. Re- sort by relevance_score; gracefully fall back to cosine order in every sad path (no model, no worker, network error, malformed response). Strictly additive precision boost — the cosine-only ranking path is unchanged when rerank isn't enabled OR returns null. Two new tests: rerank actively reorders the top-K when scores are returned, and the no-worker-online soft-degrade path preserves cosine order. 12 tests in the file passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:20:29 +02:00
Mikael Hugo	083a7d5eb6	feat(sf): /sf escalate show also distinguishes auto-accepted Same UX refinement as `e104f17ad` applied to /sf escalate show <slice>/<task>. Auto-mode resolutions now display "Auto-accepted <ts> → choice=..." instead of the generic "Resolved <ts>". The userRationale prefix "auto-mode:" already disambiguates the source; surfacing the verb makes the show view match the list view's status semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:16:41 +02:00
Mikael Hugo	e104f17ad2	feat(sf): /sf escalate list distinguishes auto-accepted from user-resolved Auto-mode resolutions stamp the artifact with userRationale prefix "auto-mode: ..." (set by auto-dispatch.ts when it auto-resolves an escalation). The list view now shows "auto-accepted (accept)" for those entries vs "resolved (option-id)" for user-resolved ones, so an operator scanning `/sf escalate list --all` can tell at a glance which decisions were autonomous and which had explicit human input. The artifact JSON is unchanged — this is purely a list-formatter refinement that surfaces information already recorded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:15:20 +02:00
Mikael Hugo	4fb3476912	docs(sf): final ADR-011 leak — /sf escalate help text Last bare "ADR-011 P2" reference was in the user-facing /sf escalate help description in commands/catalog.ts. The parallel session's `c481ede33` touched this file (added /sf reload) but left this line untouched — fixing it now closes the disambiguation sweep across the entire codebase outside test files. Comment / string-literal only diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:13:11 +02:00
Mikael Hugo	c481ede338	fix(sf): supervise dev reload path	2026-05-02 23:11:20 +02:00
Mikael Hugo	ef82fbf2c6	docs(sf): finish ADR-011 disambiguation across remaining .ts files Final pass over the comment-only ambiguity. Every internal "ADR-011" reference outside test files now reads "gsd-2 ADR-011" so the source-of-truth lookup is unambiguous (SF's local ADR-011 is "Swarm Chat and Debate Mode", which has nothing to do with progressive planning or escalation). Files: workflow-tool-executors.ts, bootstrap/db-tools.ts, unit-context-manifest.ts, commands-escalate.ts, sf-db.ts (full sweep, including remaining function docstrings), tools/plan-milestone.ts, tools/plan-slice.ts. Comment-only diff. The one bare "(ADR-011 P2)" left in commands/catalog.ts:62 (the /sf escalate help text) belongs to the parallel session's WIP edit on that file — leaving it for them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:11:16 +02:00
Mikael Hugo	f5dabf1857	docs(sf): disambiguate ADR-011 in sf-db.ts schema comments too Same fix as `df095b406` / `f1fc8cc86`, applied to the schema-comment references in sf-db.ts (column comments + migration comments). Future maintainers reading SQL definitions like: is_sketch INTEGER NOT NULL DEFAULT 0, -- ADR-011: 1 = slice is a sketch would otherwise look up SF's local ADR-011 ("Swarm Chat") and find nothing about sketches. Now reads "gsd-2 ADR-011" so the source-of- truth is unambiguous. Comment-only diff. The 5 remaining "(gsd-2)" parenthetical references already disambiguate clearly enough; left intact to avoid churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:09:34 +02:00
Mikael Hugo	f1fc8cc86b	docs(sf): disambiguate ADR-011 in PREFERENCES.md template too Same fix as `df095b406` but for the user-facing PREFERENCES.md template that ships in /sf init projects. Reading "ADR-011 P2: mid-execution escalation" without the gsd-2 prefix sends operators to SF's local ADR-011 ("Swarm Chat and Debate Mode") which has nothing to do with escalation. Markdown-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:07:13 +02:00
Mikael Hugo	df095b406a	docs(sf): disambiguate "ADR-011" — comments now say "gsd-2 ADR-011" A future maintainer reading "ADR-011 Phase 2" in escalation.ts would look up SF's local docs/dev/ADR-011 and find "Swarm Chat and Debate Mode" — totally unrelated. The escalation + progressive-planning work ports gsd-2's ADR-011 (Progressive Planning + Escalation), which happens to share the number with our local ADR-011. Prefixed every internal comment that referenced the gsd-2 ADR with "gsd-2 ADR-011" so the source-of-truth lookup is unambiguous. Comment- only diff — no compilation, runtime, or test surface affected. Files: types.ts, auto-prompts.ts, auto-dispatch.ts, escalation.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:06:34 +02:00
Mikael Hugo	b6bdbe586a	docs(sf): align refine-slice "Autonomous execution" footer with siblings The autonomous-mode footer in refine-slice.md was the short version ("Document assumptions in the plan") while plan-slice / execute-task / complete-slice all carry the full explanation: agents are in auto-mode, no human is available, document assumptions in the artifact, note human-input-required decisions in the relevant artifact and proceed with the best available option. Refine-slice gets sketches refined into full plans — same autonomy contract as plan-slice. Aligning the language so an agent reading any of these prompts gets the same self-help instructions about ask_user_questions / secure_env_collect. Markdown-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:01:44 +02:00
Mikael Hugo	16cf479781	docs(sf): surface SF_LLM_GATEWAY_* env vars in PREFERENCES template These are runtime-only settings (not YAML keys), and the previous template mentioned only the YAML phase toggles. Operators discovering the embedding/rerank surface had to read source. Adding a clear table at the bottom of PREFERENCES.md so the env-var contract is documented next to the rest of the skill prefs. Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL, SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the silent-fallback semantics and the agent_end backfill cadence. Markdown-only; no recompile needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:00:15 +02:00
Mikael Hugo	8299c7ac2b	fix(sf): clear last 2 stale failures from gsd-2 compat sweep auto-session-encapsulation invariant: the parallel session refactored auto.ts to use the getAutoSession() factory; the test still expected `new AutoSession()` literally. Updated the regex + the allowedPatterns list to accept both shapes — the invariant is "exactly one module-level binding for the AutoSession instance", not which constructor expression yields it. silent-catch-diagnostics #3348: auto-supervisor.ts:53 swallowed signal- handler exceptions silently. Added logWarning("session", ...) — the intent stays the same (signal handler must not throw), but cleanup-path errors are now visible in the journal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:51:42 +02:00
Mikael Hugo	7bec2dc2d0	fix(sf): invalidate stale embedding when memory content is updated updateMemoryContent rewrote the row but left the existing memory_embeddings vector in place — that vector was computed against the old content, so the next cosine query would score the memory by what it used to say, not what it says now. Now drop the embedding row on update; the next runEmbeddingBackfill (agent_end hook) re-embeds. Best-effort: a missing embedding is the silent-fallback case the ranker already handles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:38:24 +02:00
Mikael Hugo	a3c000de26	fix(sf): close 6 stale test failures from gsd-2 compat sweep Schema-version assertions hadn't been bumped past 21 in three places (complete-task/complete-slice/md-importer); manifest coverage tests caught the project-scoped unit types added for the deep planning gate (ADR-011) that weren't yet registered in either KNOWN_UNIT_TYPES table; workflow- templates registry test rejected docs-sync.yaml because the assertion was .md-only. - preferences-types.ts: KNOWN_UNIT_TYPES gains refine-slice, discuss-project, discuss-requirements, research-project, workflow-preferences. - unit-context-manifest.ts: same five types added to its local KNOWN_UNIT_TYPES + UNIT_MANIFESTS (TOOLS_PLANNING, scoped/full knowledge, COMMON_BUDGET_MEDIUM/LARGE). - complete-task / complete-slice / md-importer test: schema_version expectation 21 → 25. - workflow-templates test: file extension can be .md OR .yaml (docs-sync is intentionally yaml-step iteration). 6 test files / 81 tests now green that were red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:35:26 +02:00
Mikael Hugo	e5787794f3	feat(sf): /sf memory search — embedding-ranked memory query New subcommand: /sf memory search "<query>". Routes through getRelevantMemoriesRanked, so when SF_LLM_GATEWAY_KEY is set the gateway embeds the query and ranks memories by cosine + static blend; without the key, gracefully degrades to static ranking. Header text indicates which path was taken so users know whether embeddings are live. This makes the embedding pipeline operator-discoverable — previously the only consumer was the silent execute-task injection path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:22:33 +02:00
Mikael Hugo	eb5f7ef7b6	feat(sf): query-aware memory ranking — embeddings now actually matter Previous commit populated memory_embeddings rows but no consumer read them — the read path (getActiveMemoriesRanked) used pure static score (confidence × hit_count). Embeddings were silent. This wires the read side: - rankMemoriesByEmbedding (pure, in memory-embeddings.ts) blends static score with cosine similarity: combined = static * (1 + α * cosine). Defaults α=0.6 — a perfect-static + zero-similarity hit ties roughly with a low-static + perfect-similarity hit, so semantically relevant cold memories can surface above stale-but-popular ones. - embedQueryViaGateway + loadEmbeddingMap — supporting helpers. - getRelevantMemoriesRanked (memory-store.ts) — async query-aware ranker. Oversamples the static pool 5×, embeds the query, blends, returns top-K. Falls back cleanly to static ranking when: - query empty - no SF_LLM_GATEWAY_KEY (gateway not configured) - gateway request fails (500/network) - no embeddings exist yet (fresh DB / worker offline) - auto-prompts.ts: execute-task injection now uses sliceTitle + taskTitle as the query so memories relevant to the current work surface first. 10 new tests lock the contract — pure ranker math, fallback chain, and the gateway-mocked promotion case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:18:45 +02:00
Mikael Hugo	56ee89a946	feat(sf): live embeddings via inference-fabric llm-gateway + auto-backfill Adds an opt-in embedding path against `https://llm-gateway.centralcloud.com/v1` using qwen/qwen3-embedding-4b. Activated by exporting SF_LLM_GATEWAY_KEY; URL/model overridable via SF_LLM_GATEWAY_URL and SF_LLM_GATEWAY_EMBED_MODEL. Rerank surface present (SF_LLM_GATEWAY_RERANK_MODEL) but degrades to null when no rerank worker is online — current gateway has none, so it stays dormant until one comes up. - memory-embeddings-llm-gateway.ts: createGatewayEmbedFn + rerankCandidates speaking the OpenAI-shaped /v1/embeddings and /v1/rerank protocols. - memory-embeddings.ts: listUnembeddedMemoryIds + runEmbeddingBackfill — best-effort sweep, in-flight-guarded, bounded, throttled "unavailable" log. Wired into agent_end so every turn opportunistically embeds new memories when the gateway is reachable. - sf-db.ts: pre-existing bug fix — memory_embeddings, memory_relations, and memory_sources were referenced everywhere but never CREATE-d in the schema. Adding them as IF NOT EXISTS with proper FK + PK so fresh DBs actually work. - 16 new tests covering env config, embed fn shape, rerank degradation, backfill happy/sad/bounded paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:13:23 +02:00
Mikael Hugo	dd126ddc8b	fix(sf): recover model routes and self-feedback	2026-05-02 22:07:10 +02:00
Mikael Hugo	c308a492d7	chore(sf): differentiate auto-accepted vs user-resolved escalations in audit resolveEscalation gains an optional `source: "user" \| "auto-mode"` parameter (default "user"). Auto-dispatch passes "auto-mode" when it auto-accepts. The UOK audit event type now flips between "escalation-user-responded" and "escalation-auto-accepted", and the payload includes a typed `resolvedBy` field. Why: a journal grep for user actions shouldn't return auto-mode events. Audit/observability tools can now filter cleanly without string-matching the rationale prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:59:38 +02:00
Mikael Hugo	00c13bc5a1	feat(sf): persist escalation resolutions as durable memories When an escalation is resolved (auto-mode accept or user override), write the choice + rationale into the memories table with category="architecture". The "[escalation:<task>] <question>. Chose: <option>. Rationale: ..." prefix mirrors the decisions->memories backfill format so search and de-duplication work the same way. Why: getActiveMemoriesRanked auto-injects top memories into every execute-task prompt, so a resolved escalation now travels forward as implicit context across the whole project — not just the immediate carry-forward into the next task. The artifact JSON stays as the audit trail; the memory is the discoverable, semantically-ranked surface. Best-effort write — never blocks resolution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:53:56 +02:00
Mikael Hugo	7c6140517e	fix(sf): surface escalation write failures back to the agent When sf_task_complete's escalation payload was rejected (validation error) or silently dropped (feature flag off), the agent saw a clean "Completed task" response and assumed the issue was raised — but no carry-forward override was created, so the next executor saw nothing. Now the response text explicitly says: - "WARNING: escalation payload was REJECTED (<error>); the next executor will NOT see your decision" — when buildEscalationArtifact throws - "note: escalation payload was DROPPED because phases.mid_execution_escalation is disabled" — when feature flag is off Task completion is still never blocked by escalation issues — additive, auditable, agent-actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:48:35 +02:00
Mikael Hugo	b79ebbf10a	fix(sf): generalize M008 leak in systematic-debugging skill The global skill hardcoded `.sf/milestones/M008/bugs/bug-registry.json` and `M008-specific:` rules — when M008 closes the skill goes stale and misleads agents on every other milestone. Reframed as "Milestone Bug Registry Guidance": the rules apply to any milestone that ships a `bug-registry.json` + `triage-protocol.md` pair, with M008 cited as the canonical example for the registry test. When no registry exists, the section is skipped — agents follow the normal evidence/repro/fix flow. triage-protocol-registry test (31 tests) still passes — keeps the literal `bug-registry.json` reference and HIGH/MEDIUM/LOW + cluster + update-after-fix assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:44:08 +02:00
Mikael Hugo	08859624f8	feat(sf): teach executor about the escalation field on sf_task_complete The escalation feature was invisible to agents — the prompt didn't say it existed, so agents made silent assumptions instead of surfacing genuine tradeoffs. Now, when phases.mid_execution_escalation is on, execute-task includes a guidance block showing the escalation payload shape and noting auto-mode auto-accepts the recommendation by default. When the feature is off the field is silently dropped, so the guidance is omitted entirely to avoid misleading the agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:41:38 +02:00
Mikael Hugo	3895ae2cd3	feat(sf): auto-mode is autonomous — escalations auto-accept by default Auto is autonomous, so the escalating-task dispatch rule shouldn't halt the loop. Default: accept the agent's recommendation, record the choice with `auto-mode: ...` rationale, and let the next dispatch cycle pick up the carry-forward override. Users can review or override via `/sf escalate list --all` later. Set `phases.escalation_auto_accept: false` to keep gsd-2's pause-and-ask behavior (loop halts until the user runs `/sf escalate resolve`). - types.ts: add escalation_auto_accept (default true) - preferences-validation.ts: allowlist + warn on unknown phase keys - auto-dispatch.ts: rename rule to "auto-accept-or-pause"; on auto-accept resolve via resolveEscalation("accept", ...) and return action:"skip" so the next cycle re-reads state cleanly - PREFERENCES.md: surface the toggle with the autonomy rationale - tests/escalation-auto-accept.test.ts: 4 cases — default accept, explicit true, explicit false (preserves pause), non-escalating phase no-op Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:36:15 +02:00
Mikael Hugo	0f0aee5bf0	feat(sf): port 3 gsd-2 DB helpers + improve /sf escalate list Three small DB helpers from gsd-2 that SF was missing, plus a UX improvement to /sf escalate list that uses one of them. PDD spec: setSliceSketchFlag(milestoneId, sliceId, isSketch) — generalized sketch-flag setter. Replaces my narrower clearSliceSketch (which remains as a thin wrapper for callers that only zero). Use this when a re-plan flow wants to revert a slice back to sketch state. autoHealSketchFlags(milestoneId, hasPlanFile) — safety net for progressive planning. Predicate-based: caller passes a function that resolves whether a PLAN file exists for a slice, function flips is_sketch=0 for any slice that has both is_sketch=1 AND a plan file. Catches DB-FS drift after crashes/manual edits. listEscalationArtifacts(milestoneId, includeResolved=false) — cross-slice DB-side filter for /sf escalate list. Replaces my hand-rolled inner-loop over getMilestoneSlices() + getSliceTasks() + filter — single SQL query, sorted by sequence, faster. UX improvement to commands-escalate.ts: - /sf escalate list: now uses listEscalationArtifacts; shows PENDING / awaiting-review / resolved status badges per entry. - /sf escalate list --all: includes resolved entries (audit trail). - Better hint message when none active: 'Use --all to include resolved'. Verified: - typecheck clean (one parallel-session-introduced error in self-feedback-drain.ts is unrelated — they import a missing utils/error.ts; will land when their commit does). - escalation-feature.test.ts (21 tests) + sf-db.test.ts (16 tests) still pass — no regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:22:02 +02:00

1 2 3 4 5 ...

2809 commits