Commit graph

2833 commits

Author SHA1 Message Date
Mikael Hugo
d4b3e0f2b0 feat(schedule): add lightweight due-items banner to loader.ts 2026-05-05 01:37:51 +02:00
Mikael Hugo
7e1883844a feat(schedule): auto-dispatch rule in DISPATCH_RULES 2026-05-05 01:34:50 +02:00
Mikael Hugo
94ba38bdd6 feat(schedule): launch banner, headless query field, auto_dispatch type 2026-05-05 01:30:04 +02:00
Mikael Hugo
c3e9296986 fix(types): restore hand-written d.ts ambient declarations
Previous fix commit (e0d1352c4) only updated .gitignore to allow
src/resources/extensions/**/*.d.ts but did not actually re-commit
the file contents that were deleted in snapshot 405381985. Restoring
from bcf79a713 (the latest version with all exported symbols).

Files restored:
- remote-questions/config.d.ts
- search-the-web/url-utils.d.ts
- sf/agentic-docs-scaffold.d.ts
- sf/code-intelligence.d.ts
- sf/doc-checker.d.ts
- sf/doctor.d.ts
- sf/gitignore.d.ts
- sf/native-git-bridge.d.ts
- sf/paths.d.ts
- sf/preferences-models.d.ts
- sf/preferences.d.ts
- sf/repo-identity.d.ts
- sf/trace-collector.d.ts
- sf/types.d.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:19:05 +02:00
Mikael Hugo
77e429a088 feat(schedule): CLI commands add/list/done/cancel/snooze/run + wiring 2026-05-05 01:18:02 +02:00
Mikael Hugo
b92d7bc96b sf snapshot: pre-dispatch, uncommitted changes after 33m inactivity 2026-05-05 01:11:49 +02:00
Mikael Hugo
d3954ff529 sf snapshot: pre-dispatch, uncommitted changes after 30m inactivity 2026-05-05 00:38:05 +02:00
Mikael Hugo
342871e85e docs: clarify guided planning artifacts 2026-05-05 00:07:48 +02:00
Mikael Hugo
959e15ef42 fix: wire bundled extension inventory 2026-05-05 00:04:53 +02:00
Mikael Hugo
47c806d733 fix: version sf extension runtime sources 2026-05-04 23:27:20 +02:00
Mikael Hugo
56aaf5bb45 sf snapshot: pre-dispatch, uncommitted changes after 42m inactivity 2026-05-04 22:41:07 +02:00
Mikael Hugo
4053819854 sf snapshot: pre-dispatch, uncommitted changes after 41m inactivity 2026-05-04 21:59:01 +02:00
Mikael Hugo
b8a5a01de4 refactor(skills): remove acquiring-skills bundled skill
The acquiring-skills skill was a personal developer workflow with
hardcoded paths that did not apply to general sf users.

Rationale for removal rather than generalization:
- SF bundled skills are already generic and installed for all users.
- External skills are consumed via the Anthropic marketplace.
- Per-project custom skills are covered by the creating-skills skill.

Resolves self-feedback sf-mookqlyr-snco79.
2026-05-04 21:17:59 +02:00
Mikael Hugo
66c7d6a47e refactor(skills): generalize acquiring-skills and remove personal references
Replace the developer-specific acquiring-skills skill with a generic
version that any SF user can follow.

Changes:
- Removed all personal references (/home/mhugo/code/, mikki-bunker,
  ace-coder, letta-workspace, dr-repo, singularity-package-intelligence)
- Replaced Method 2 (rsync from local repos) and Method 3 (rsync from
  bunker) with a generic local-project porting workflow
- Replaced Trusted Sources table with only public, universally
  accessible repositories (anthropics/skills, singularity-forge)
- Kept all safety rules (inspect scripts, no curl|bash, untrusted
  sources require approval)
- Kept the Adaptation Checklist for porting foreign skills to sf
- References the Anthropic skills marketplace as the primary source

Resolves self-feedback sf-mookqlyr-snco79.
2026-05-04 21:13:35 +02:00
Mikael Hugo
6037407c99 fix(auto): reconcile stale complete-slice runtime records at bootstrap
Prevents pi runtime flow-audit from emitting false-positive stale-dispatch
warnings for slices that completed successfully on retry.

Problem: when a complete-slice unit is cancelled (e.g. provider quota error)
and then retried successfully, the prior cancelled journal/runtime state can
still trigger a flow-audit warning on the next session start. The detector
reads the cancelled unit-end event but does not check for later successful
retries or existing artifact files (#sf-moqv5o7h-vaabu6).

Fix: at auto-mode bootstrap, after cleanStaleRuntimeUnits, run a new
reconcileStaleCompleteSliceRecords() pass that:
- Lists all unit runtime records for complete-slice units
- Filters for terminal non-completed states (cancelled, failed, stale,
  runaway-recovered)
- Checks DB slice status === 'complete'
- Checks SUMMARY.md exists with valid completed_at frontmatter
- Clears stale runtime records that pass both checks

Files changed:
- src/resources/extensions/sf/unit-runtime.js: add reconcileStaleCompleteSliceRecords
- src/resources/extensions/sf/auto-start.js: call it after cleanStaleRuntimeUnits
- src/tests/unit-runtime-reconcile.test.ts: unit tests for the new function
2026-05-04 20:45:33 +02:00
Mikael Hugo
ed4a4bc93a chore: commit current worktree state 2026-05-04 19:28:39 +02:00
Mikael Hugo
bcf79a7136 fix(types): update .d.ts declarations with all exported symbols
Update all TypeScript declaration files to include every exported function,
const, and interface from their corresponding .js modules. Fixes TS2305
errors for missing exports.

- preferences: add all 16 exported functions
- preferences-models: add all 20 exported functions
- gitignore: add all 6 exported functions
- agentic-docs-scaffold: add SCAFFOLD_FILES const
- doc-checker: add formatDocCheckReport
- code-intelligence: add all 21 exports including backend constants
- native-git-bridge: add all 50+ exported git operations
- paths: add all 30+ path resolution functions
- repo-identity: add all 9 exported functions
- trace-collector: add Span/Trace interfaces and all 12 functions
- types: add SFState interface
- doctor: add both exported functions
- url-utils: add all 8 exported functions
- config: add RemoteConfig interface and all functions
2026-05-04 19:02:04 +02:00
Mikael Hugo
33383ed53a fix(types): add TypeScript declarations for JS modules
Add .d.ts files for all JS modules imported by TypeScript source to resolve
TS7016 errors. Files are force-added because src/**/*.d.ts is gitignored.

- preferences, preferences-models, gitignore, agentic-docs-scaffold
- doc-checker, code-intelligence, native-git-bridge, paths, repo-identity
- types, trace-collector, doctor, url-utils, config
2026-05-04 18:57:34 +02:00
Mikael Hugo
ccdd3027ab perf(read): stream lines when offset/limit provided to avoid loading entire file
When offset or limit are specified, use Node.js readline streaming instead of
loading the entire file into memory. This fixes the truncation issue for large
files (>50KB) where the read tool would return truncated content even when
requesting a small slice.

- Add readLinesStreamed() for memory-efficient line reading
- Add countLines() for total line count without full read
- Use streaming path when offset !== undefined || limit !== undefined
- Keep existing full-file read path when no offset/limit specified
- Add tests for streaming behavior with large files

Fixes the long-standing issue where reading large files like src/headless.ts
(~50KB) with offset/limit would still hit truncation limits.
2026-05-04 15:20:16 +02:00
Mikael Hugo
abe34084a4 sf snapshot: uncommitted changes after 67m inactivity 2026-05-04 12:46:41 +02:00
Mikael Hugo
7c348704ec sf snapshot: uncommitted changes after 111m inactivity 2026-05-04 11:38:58 +02:00
Mikael Hugo
0037f44677 sf snapshot: pre-dispatch, uncommitted changes after 83m inactivity 2026-05-04 09:47:30 +02:00
Mikael Hugo
8c66c11131 fix(sf): prevent phantom work from stale file paths in task plans
Adds three layers of defense against the M008/S03 failure mode where
bug-hunt findings referenced .ts files that had been deleted in a prior
corrupted snapshot commit (f712c339b), but .js versions with fixes survived.

1. Prompt-level safeguards:
   - research-slice.md: researchers must verify file existence before listing
     paths in findings
   - plan-slice.md: planners must confirm files exist before including them
     in task plans
   - execute-task.md: executors must verify files exist before editing;
     escalate as blocker if missing

2. Runtime pre-flight validation:
   - system-context.js: validateTaskPlanFiles() extracts backtick-wrapped
     paths from task plans and checks existence before dispatch
   - Missing files trigger a warning injected into the execute-task prompt
   - Logs warning for observability

This prevents the research→plan→execute pipeline from propagating stale
file paths that cause phantom work, runaway guard intervention, and
flow-audit failures.

Fixes: sf-moqgvdi7-mxc1sr (flow-audit:repeated-milestone-failure)
Related: M008/S03 bug-hunt cluster
2026-05-04 08:24:04 +02:00
Mikael Hugo
bffd6c22fc sf snapshot: pre-dispatch, uncommitted changes after 42m inactivity 2026-05-04 02:34:07 +02:00
Mikael Hugo
061985b226 fix(sf): runaway guard treats token count as secondary signal
Token count now only triggers a warning when accompanied by a primary
signal (high tool calls, long elapsed time, or many changed files).
This prevents false positives on units doing real work with large
context models, where 25+ tool calls can legitimately burn 1M+ tokens.

Also renames 'session tokens' to 'unit tokens' in guard messages to
clarify that the metric is delta-from-unit-start, not cumulative.

Fixes sf-moqewawp-ijwjjt
2026-05-04 01:51:33 +02:00
Mikael Hugo
f712c339b3 sf snapshot: pre-dispatch, uncommitted changes after 1497m inactivity 2026-05-04 01:22:39 +02:00
Mikael Hugo
6384c5b44c test(sf): integration test — graph-boost lifts neighbor through full pipeline
Pure-function tests for applyRelationBoost (55b14c3f7) cover the
math, but the wired-through path (createMemoryRelation → boost picked
up by getRelevantMemoriesRanked → reordered output) had no
end-to-end test.

New test:
1. Creates memories a, b, c with orthogonal embeddings
2. Mocks gateway to return a query vector aligned only with a
3. Wires a→b with related_to (confidence 1.0)
4. Asserts ranking: a (cosine top) > b (boost from a) > c (unrelated)

Locks the contract that the boost actually fires through the full
pipeline, not just the pure helper. 16 → 17 tests in the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:25:07 +02:00
Mikael Hugo
22109cee6a docs(sf): escalation.ts header lists carry-forward + memory persistence
The header listed "artifact I/O, detection, flag flips, resolution" but
not the carry-forward injection (claimOverrideForInjection /
formatOverrideBlock) or the memory persistence calls now embedded in
both writeEscalationArtifact (continueWithDefault path, b9bff3762
sibling) and resolveEscalation (00c13bc5a). These are load-bearing
behaviors a contributor should know up front.

Also folded the "SF's local ADR-011 is 'Swarm Chat'" disambiguation
note into the header (matches the convention the rest of the
disambiguation sweep set).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:22:54 +02:00
Mikael Hugo
ec4dab450b docs(sf): clarify memory-sleeper.ts is NOT part of the memory pipeline
memory-sleeper.ts had no file header and the "memory" prefix is
misleading — it's a runtime tool-output watchdog (detects repeated
bash failures, too-large tool results) that emits steers, completely
unrelated to memory-store / memory-relations / memory-embeddings.

A contributor reading directory listing top-down would reasonably
assume this file participates in the same pipeline as the other
memory-*.ts modules. Header now states the historical naming and
points readers in the right direction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:21:06 +02:00
Mikael Hugo
e10511ce38 docs(sf): memory-embeddings.ts header reflects actual pipeline
The previous header had two stale references:
- "buildMemoryLLMCall pattern, prefers a dedicated embedding-capable
  model" — describes a hook that actually returns null on every call
  (the Pi SDK has no provider-neutral embedding API yet).
- "queryMemoriesRanked falls back to keyword-only scoring" —
  function doesn't exist; the real consumer is
  getRelevantMemoriesRanked, and the fallback is static (confidence
  × hit_count), not keyword.

Updated to describe the actual three-stage read pipeline (cosine →
relation-boost → optional rerank) and the soft-degrade fallback to
static ranking when the gateway is offline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:18:46 +02:00
Mikael Hugo
308958453d docs(sf): memory-relations.ts header reflects actual writers + readers
The file header described an aspirational design ("LINK actions
emitted by the memory extractor, or future /sf memory link CLI") that
never matched code reality. As of this session:

Writers shipped:
 (a) applyMemoryActions auto-links co-extracted memories with
     related_to (b9bff3762)
 (b) /sf memory import loads explicit edges from JSON

Read consumers shipped:
 (1) getRelevantMemoriesRanked graph-boost (55b14c3f7)
 (2) sf_graph MCP tool (pre-existing)

Updated the header so a contributor reading top-down sees the
current data flow, not the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:17:03 +02:00
Mikael Hugo
b9bff37623 feat(sf): co-extracted memories get auto-linked with related_to
Previous commit (55b14c3f7) wired memory_relations into ranking, but
the table was empty — no writer added edges.

applyMemoryActions now links memories created in the same batch
pairwise with `related_to` edges (confidence 0.5 reflects "from same
extraction context" being weaker evidence than an explicit
human-authored relation). Pairwise O(n²) is fine for typical
extractor batches of 1–5 memories.

Combined with 55b14c3f7's relation-boost ranker, the effect is:
extracting memories A, B, C from one slice transcript ⇒ when later a
query hits A, B and C get a small score bump (and vice versa). The
cohort surfaces together rather than fragmenting across categories.

UPDATE / REINFORCE / SUPERSEDE actions don't trigger linkage —
linkage is for new co-extracted context, not modifications of
existing memories.

Best-effort: relation creation failures don't roll back the memory
batch. 14 → 16 tests in memory-store.test.ts; new tests verify the
3-memory batch yields C(3,2)=3 edges and a single-CREATE batch yields 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:13:21 +02:00
Mikael Hugo
55b14c3f78 feat(sf): wire memory_relations into ranking — graph-boost pass
memory_relations was storage-only since 56ee89a94 / 23c5de38b. Now
getRelevantMemoriesRanked walks edges of cosine top-N memories and
applies a one-pass score-boost to neighbors:

  combined += parent_score × edge_confidence × damping

where damping=0.4 by default. Both endpoints of an edge get the boost
symmetrically (memory A pulling B is equally evidence that B is
relevant to A's context).

Pure helper `applyRelationBoost(ranked, edges, options)` lives in
memory-embeddings.ts so memory-store.ts doesn't take a direct
dependency on memory-relations.ts; the call site composes the two
modules. When memory_relations is empty (the case until a writer
adds edges — a future agent or hook), applyRelationBoost returns the
input unchanged → no behavior change today.

Intra-pool only: cross-pool edges (where one endpoint is outside the
50–200 cosine pool) are skipped to avoid pulling in low-static
memories on a hot edge alone. Pool expansion via relations would be
a separate, more invasive feature.

4 new tests cover empty edges, empty ranked, cross-pool edge skip,
and the canonical "low-but-related promoted above lone" case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:09:33 +02:00
Mikael Hugo
1da4d5fdf6 perf(sf): index memory_relations.to_id for reverse-edge lookups
Audit of all FROM/INTO/UPDATE clauses in the codebase against
CREATE TABLE statements found one missing index. memory_relations
PK is (from_id, to_id, rel) — covers from_id as leading column. But
memory-relations.ts:233 queries `WHERE to_id = :id` which would
full-scan once the relation count grows.

Added idx_memory_relations_to. Cheap insertion cost; avoids the
worst-case query as soon as a ranker consumer starts traversing
edges (the natural next-step from 23c5de38b).

Schema-gap audit (option 3 in the redirect): no other ghost-table
references found. unit_claims has its own .sf/unit-claims.db and
self-contained schema in unit-ownership.ts. active_decisions /
active_requirements / active_memories are CREATE VIEW IF NOT EXISTS,
properly created. "INTO worktree" was a JSDoc false positive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:05:05 +02:00
Mikael Hugo
72104aed1d fix(sf): formatMemoriesForPrompt rank-preserving mode + use it in execute-task
Real semantic bug: getRelevantMemoriesRanked returns memories in
score-descending order (cosine + optional rerank), but
formatMemoriesForPrompt then re-grouped them by CATEGORY_PRIORITY
(gotcha=0 first, convention=1, ...). A high-relevance "convention"
memory got buried under low-relevance "gotcha" entries purely because
gotcha has higher category priority. The agent never saw the most
relevant items at the top.

formatMemoriesForPrompt gains a `preserveRankOrder` parameter (default
false for backward compat). When true:
 - Renders bullets in input order
 - Tags each line with [category] so the agent can still tell
   gotchas from conventions

Wired auto-prompts.ts execute-task injection: when memoryQuery is
non-empty (i.e. query-aware ranker was used), pass true. Static-ranked
input keeps the historical category-grouped layout.

Tests verify both modes side-by-side using identical input — the
ordering flip is the load-bearing assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:02:59 +02:00
Mikael Hugo
a3698b4e6c docs(sf): file-header comment for /sf escalate also mentions --all
Same disambiguation as 45b669ac3 but for the source-file header
comment (a contributor reading commands-escalate.ts top-down sees the
same surface as `/sf escalate help`).

Comment-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:56:36 +02:00
Mikael Hugo
45b669ac32 docs(sf): /sf escalate help mentions --all flag
Commit 0f0aee5bf added the --all flag to /sf escalate list (showing
resolved entries in addition to active ones), but the usage() text
never advertised it. Operators discovered the flag only by reading
source. Adding it to the help line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:55:18 +02:00
Mikael Hugo
0426e61cea fix(sf): getRelevantMemoriesRanked pool size never less than limit
Pool was Math.min(50, limit * 5). For default limit=10 this gives 50
(intended 5× oversample for rerank). But for limit=100 it gives 50 —
caller asking for 100 results would silently get at most 50.

Now: max(limit, limit * 5), capped at 200 to bound rerank latency on
huge requests. Default behavior unchanged for limit ≤ 10; large
requests now work correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:49:18 +02:00
Mikael Hugo
e2e708fc11 test(sf): lock continueWithDefault memory persistence contract
Two new tests covering the symmetric write shipped in 7a5b12540:

1. writeEscalationArtifact with continueWithDefault=true → memory
   created with "[escalation:T##]" prefix, "auto-applied default:"
   rationale marker, and Fail option label (the recommendation).
2. writeEscalationArtifact with continueWithDefault=false → NO memory
   at write time (pending entries defer persistence to resolveEscalation
   per existing behavior).

Together with the resolve-time tests in 3b5e6588e, all three
escalation flows (resolved, auto-accepted, default-applied) have
locked memory-persistence contracts. 23 → 25 tests in the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:47:08 +02:00
Mikael Hugo
7a5b125405 feat(sf): persist continueWithDefault escalations as memories too
When an agent escalates with continueWithDefault=true, it has already
proceeded with the recommendation — the artifact JSON captures the
audit trail but no other surface carries the rationale forward.
Downstream tasks running after this one would query memories and find
nothing about the choice.

resolveEscalation already writes a memory on the continueWithDefault=
false path (after operator resolves). This is the symmetric write for
the continueWithDefault=true path: same category="architecture",
same "[escalation:T##]" prefix, with the rationale prefixed
"auto-applied default: ..." so a journal scan can tell apart
continueWithDefault entries from operator-resolved ones.

Now a slice's full decision history (operator-resolved + auto-accepted
+ default-applied escalations) lives uniformly in the memory store and
flows into the cosine ranking for downstream prompts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:46:07 +02:00
Mikael Hugo
fec6c293bf docs(sf): align agent escalation guidance with already-resolved reality
The execute-task escalation guidance claimed the user "can review or
override later via /sf escalate". Commit c1ce9aac1 already made the
already-resolved message explicit that auto-accepted decisions can't
be retroactively undone — the carry-forward into downstream tasks
happens before any operator could intervene.

Updated the agent-facing guidance to match: auto-mode accepts +
persists as memory + carries forward; the operator gets the audit
trail via /sf escalate list --all but the executed work stands. This
shifts the agent's incentive toward thorough rationale capture (since
that's what survives) rather than the false comfort of "the user can
fix it later".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:43:18 +02:00
Mikael Hugo
5cc2522646 feat(sf): /sf memory search header reports rerank state too
After aa60821ec wired the rerank pass, the search header still said
"(embedding-ranked)" even when SF_LLM_GATEWAY_RERANK_MODEL was set
and the worker was online. The user couldn't tell whether they were
seeing cosine-only or rerank-enhanced results.

Now the header has three states:
- "(embedding+rerank-ranked)" — both env vars set
- "(embedding-ranked)" — only SF_LLM_GATEWAY_KEY set
- "(static rank — set SF_LLM_GATEWAY_KEY for embeddings)" — neither

Header-only diff. The rerank can still soft-degrade silently if the
worker is offline (caller throttles the warning to once/min) — header
reports the configured state, not the realized state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:39:28 +02:00
Mikael Hugo
54f27bd02c test(sf): lock embedding lifecycle hygiene contract
Three new tests covering the embedding-cleanup paths shipped in
7bec2dc2d / 1b71ddd17 / 05a326a29:

1. updateMemoryContent → drops the existing memory_embeddings row
   (next backfill re-embeds the new content).
2. supersedeMemory → drops the superseded memory's embedding while
   preserving the live one's.
3. enforceMemoryCap → sweeps embeddings of newly-superseded memories
   so memory_embeddings stays aligned with active memories after a
   batch cap.

Without these, a regression in the cleanup paths would silently leave
orphaned vectors that loadAllEmbeddings's superseded_by filter masks
at query time but bloats the table forever.

11 → 14 tests in memory-store.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:35:15 +02:00
Mikael Hugo
3b5e6588e9 test(sf): lock escalation→memory persistence contract
Commit 00c13bc5a added "createMemory on resolveEscalation" but the
behavior was untested — a regression that broke it would silently
disable the cross-session learning surface (the [escalation:T##]
memories are what carry agent rationales forward via getRelevantMemories
ranking).

Two new tests:
1. resolveEscalation with explicit user rationale → memory contains
   the question, choice, and user rationale, category=architecture.
2. resolveEscalation with empty rationale → falls back to the
   artifact's recommendationRationale (the formatEscalationMemoryContent
   contract).

23 tests in the file now (was 21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:33:18 +02:00
Mikael Hugo
c1ce9aac15 docs(sf): better message when /sf escalate resolve hits an already-resolved entry
The "already-resolved" branch returned a bare timestamp with no
guidance. Auto-accepted escalations especially leave the user wondering
what to do — the carry-forward was already injected into the next
task, so this command can't retroactively undo the choice.

Now the message distinguishes auto-accepted vs user-resolved and, for
the auto-accepted case, points to `/sf memory note "..."` as the
forward-looking corrective surface (it lands in memory_embeddings on
next backfill and influences future ranking).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:32:01 +02:00
Mikael Hugo
5fda99bfae chore(sf): throttle rerank-unavailable warnings to once per minute
When SF_LLM_GATEWAY_RERANK_MODEL is set but no rerank worker is online,
every memory query (per execute-task prompt assembly) would log
"[sf:memory-embeddings] WARN: llm-gateway /rerank unavailable (503)" —
several lines per turn, all redundant. The soft-degrade is expected in
this state.

Now the message logs at most once per 60s. Symmetric with the
runEmbeddingBackfill unavailable-throttle pattern. Both sad-path
loggers stay informative (the operator sees one line and knows the
worker is down) without drowning the journal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:27:57 +02:00
Mikael Hugo
0ee94f21be chore(sf): drop chatty backfill success log
runEmbeddingBackfill fires on every agent_end (per-turn). When the
gateway is online and a project produces memories, every turn would
write a "[sf:memory-embeddings] WARN: backfill: embedded N memories"
line — successes labeled as warnings, repeating on every cycle. That
both inflates the stderr stream and misleads grep-for-WARN diagnostics.

Successes are routine; the function's return value carries the count
when a caller cares. Failures still log (throttled to 60s) via the
existing path. Net effect: the embedding pipeline runs silently in the
happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:25:35 +02:00
Mikael Hugo
05a326a294 fix(sf): enforceMemoryCap sweeps orphaned embeddings too
Same orphan-cleanup as 1b71ddd17 but for the batch path. enforceMemoryCap
calls supersedeLowestRankedMemories, which marks N lowest memories
superseded in one UPDATE — bypassing the per-memory supersede embedding
cleanup. The result was that capping a project at 50 memories left dead
embedding rows for everything that got demoted.

Now: a single DELETE-IN-SUBQUERY removes embedding rows for any memory
that no longer has superseded_by IS NULL — covers both the cap path
and any historical orphans from before the per-row cleanup landed.
Best-effort; cap enforcement is load-bearing, embedding cleanup is not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:23:37 +02:00
Mikael Hugo
1b71ddd178 fix(sf): drop embedding row when memory is superseded
supersedeMemory soft-deleted via superseded_by but left the
memory_embeddings row in place. loadAllEmbeddings already filters
by superseded_by IS NULL, so the orphaned row is harmless functionally
— but it wastes storage, complicates manual SQL audits, and is
inconsistent with updateMemoryContent (which already invalidates the
embedding via 7bec2dc2d).

Best-effort delete; supersede still succeeds even if the embedding
delete raises. Symmetric with the update path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:21:57 +02:00
Mikael Hugo
aa60821ec3 feat(sf): wire rerank pass into getRelevantMemoriesRanked
The gateway rerank surface was shipped dormant in 56ee89a94 — the
function existed but no consumer called it, so setting
SF_LLM_GATEWAY_RERANK_MODEL did nothing functional.

Now: after the cosine-rank top-K is computed, optionally call
rerankCandidates(query, top-K) when a rerank model is configured. Re-
sort by relevance_score; gracefully fall back to cosine order in every
sad path (no model, no worker, network error, malformed response).

Strictly additive precision boost — the cosine-only ranking path is
unchanged when rerank isn't enabled OR returns null.

Two new tests: rerank actively reorders the top-K when scores are
returned, and the no-worker-online soft-degrade path preserves cosine
order. 12 tests in the file passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:20:29 +02:00