Token count now only triggers a warning when accompanied by a primary
signal (high tool calls, long elapsed time, or many changed files).
This prevents false positives on units doing real work with large
context models, where 25+ tool calls can legitimately burn 1M+ tokens.
Also renames 'session tokens' to 'unit tokens' in guard messages to
clarify that the metric is delta-from-unit-start, not cumulative.
Fixes sf-moqewawp-ijwjjt
Pure-function tests for applyRelationBoost (55b14c3f7) cover the
math, but the wired-through path (createMemoryRelation → boost picked
up by getRelevantMemoriesRanked → reordered output) had no
end-to-end test.
New test:
1. Creates memories a, b, c with orthogonal embeddings
2. Mocks gateway to return a query vector aligned only with a
3. Wires a→b with related_to (confidence 1.0)
4. Asserts ranking: a (cosine top) > b (boost from a) > c (unrelated)
Locks the contract that the boost actually fires through the full
pipeline, not just the pure helper. 16 → 17 tests in the file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The header listed "artifact I/O, detection, flag flips, resolution" but
not the carry-forward injection (claimOverrideForInjection /
formatOverrideBlock) or the memory persistence calls now embedded in
both writeEscalationArtifact (continueWithDefault path, b9bff3762
sibling) and resolveEscalation (00c13bc5a). These are load-bearing
behaviors a contributor should know up front.
Also folded the "SF's local ADR-011 is 'Swarm Chat'" disambiguation
note into the header (matches the convention the rest of the
disambiguation sweep set).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
memory-sleeper.ts had no file header and the "memory" prefix is
misleading — it's a runtime tool-output watchdog (detects repeated
bash failures, too-large tool results) that emits steers, completely
unrelated to memory-store / memory-relations / memory-embeddings.
A contributor reading directory listing top-down would reasonably
assume this file participates in the same pipeline as the other
memory-*.ts modules. Header now states the historical naming and
points readers in the right direction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous header had two stale references:
- "buildMemoryLLMCall pattern, prefers a dedicated embedding-capable
model" — describes a hook that actually returns null on every call
(the Pi SDK has no provider-neutral embedding API yet).
- "queryMemoriesRanked falls back to keyword-only scoring" —
function doesn't exist; the real consumer is
getRelevantMemoriesRanked, and the fallback is static (confidence
× hit_count), not keyword.
Updated to describe the actual three-stage read pipeline (cosine →
relation-boost → optional rerank) and the soft-degrade fallback to
static ranking when the gateway is offline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The file header described an aspirational design ("LINK actions
emitted by the memory extractor, or future /sf memory link CLI") that
never matched code reality. As of this session:
Writers shipped:
(a) applyMemoryActions auto-links co-extracted memories with
related_to (b9bff3762)
(b) /sf memory import loads explicit edges from JSON
Read consumers shipped:
(1) getRelevantMemoriesRanked graph-boost (55b14c3f7)
(2) sf_graph MCP tool (pre-existing)
Updated the header so a contributor reading top-down sees the
current data flow, not the original plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit (55b14c3f7) wired memory_relations into ranking, but
the table was empty — no writer added edges.
applyMemoryActions now links memories created in the same batch
pairwise with `related_to` edges (confidence 0.5 reflects "from same
extraction context" being weaker evidence than an explicit
human-authored relation). Pairwise O(n²) is fine for typical
extractor batches of 1–5 memories.
Combined with 55b14c3f7's relation-boost ranker, the effect is:
extracting memories A, B, C from one slice transcript ⇒ when later a
query hits A, B and C get a small score bump (and vice versa). The
cohort surfaces together rather than fragmenting across categories.
UPDATE / REINFORCE / SUPERSEDE actions don't trigger linkage —
linkage is for new co-extracted context, not modifications of
existing memories.
Best-effort: relation creation failures don't roll back the memory
batch. 14 → 16 tests in memory-store.test.ts; new tests verify the
3-memory batch yields C(3,2)=3 edges and a single-CREATE batch yields 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
memory_relations was storage-only since 56ee89a94 / 23c5de38b. Now
getRelevantMemoriesRanked walks edges of cosine top-N memories and
applies a one-pass score-boost to neighbors:
combined += parent_score × edge_confidence × damping
where damping=0.4 by default. Both endpoints of an edge get the boost
symmetrically (memory A pulling B is equally evidence that B is
relevant to A's context).
Pure helper `applyRelationBoost(ranked, edges, options)` lives in
memory-embeddings.ts so memory-store.ts doesn't take a direct
dependency on memory-relations.ts; the call site composes the two
modules. When memory_relations is empty (the case until a writer
adds edges — a future agent or hook), applyRelationBoost returns the
input unchanged → no behavior change today.
Intra-pool only: cross-pool edges (where one endpoint is outside the
50–200 cosine pool) are skipped to avoid pulling in low-static
memories on a hot edge alone. Pool expansion via relations would be
a separate, more invasive feature.
4 new tests cover empty edges, empty ranked, cross-pool edge skip,
and the canonical "low-but-related promoted above lone" case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of all FROM/INTO/UPDATE clauses in the codebase against
CREATE TABLE statements found one missing index. memory_relations
PK is (from_id, to_id, rel) — covers from_id as leading column. But
memory-relations.ts:233 queries `WHERE to_id = :id` which would
full-scan once the relation count grows.
Added idx_memory_relations_to. Cheap insertion cost; avoids the
worst-case query as soon as a ranker consumer starts traversing
edges (the natural next-step from 23c5de38b).
Schema-gap audit (option 3 in the redirect): no other ghost-table
references found. unit_claims has its own .sf/unit-claims.db and
self-contained schema in unit-ownership.ts. active_decisions /
active_requirements / active_memories are CREATE VIEW IF NOT EXISTS,
properly created. "INTO worktree" was a JSDoc false positive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real semantic bug: getRelevantMemoriesRanked returns memories in
score-descending order (cosine + optional rerank), but
formatMemoriesForPrompt then re-grouped them by CATEGORY_PRIORITY
(gotcha=0 first, convention=1, ...). A high-relevance "convention"
memory got buried under low-relevance "gotcha" entries purely because
gotcha has higher category priority. The agent never saw the most
relevant items at the top.
formatMemoriesForPrompt gains a `preserveRankOrder` parameter (default
false for backward compat). When true:
- Renders bullets in input order
- Tags each line with [category] so the agent can still tell
gotchas from conventions
Wired auto-prompts.ts execute-task injection: when memoryQuery is
non-empty (i.e. query-aware ranker was used), pass true. Static-ranked
input keeps the historical category-grouped layout.
Tests verify both modes side-by-side using identical input — the
ordering flip is the load-bearing assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same disambiguation as 45b669ac3 but for the source-file header
comment (a contributor reading commands-escalate.ts top-down sees the
same surface as `/sf escalate help`).
Comment-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 0f0aee5bf added the --all flag to /sf escalate list (showing
resolved entries in addition to active ones), but the usage() text
never advertised it. Operators discovered the flag only by reading
source. Adding it to the help line.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pool was Math.min(50, limit * 5). For default limit=10 this gives 50
(intended 5× oversample for rerank). But for limit=100 it gives 50 —
caller asking for 100 results would silently get at most 50.
Now: max(limit, limit * 5), capped at 200 to bound rerank latency on
huge requests. Default behavior unchanged for limit ≤ 10; large
requests now work correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new tests covering the symmetric write shipped in 7a5b12540:
1. writeEscalationArtifact with continueWithDefault=true → memory
created with "[escalation:T##]" prefix, "auto-applied default:"
rationale marker, and Fail option label (the recommendation).
2. writeEscalationArtifact with continueWithDefault=false → NO memory
at write time (pending entries defer persistence to resolveEscalation
per existing behavior).
Together with the resolve-time tests in 3b5e6588e, all three
escalation flows (resolved, auto-accepted, default-applied) have
locked memory-persistence contracts. 23 → 25 tests in the file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an agent escalates with continueWithDefault=true, it has already
proceeded with the recommendation — the artifact JSON captures the
audit trail but no other surface carries the rationale forward.
Downstream tasks running after this one would query memories and find
nothing about the choice.
resolveEscalation already writes a memory on the continueWithDefault=
false path (after operator resolves). This is the symmetric write for
the continueWithDefault=true path: same category="architecture",
same "[escalation:T##]" prefix, with the rationale prefixed
"auto-applied default: ..." so a journal scan can tell apart
continueWithDefault entries from operator-resolved ones.
Now a slice's full decision history (operator-resolved + auto-accepted
+ default-applied escalations) lives uniformly in the memory store and
flows into the cosine ranking for downstream prompts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The execute-task escalation guidance claimed the user "can review or
override later via /sf escalate". Commit c1ce9aac1 already made the
already-resolved message explicit that auto-accepted decisions can't
be retroactively undone — the carry-forward into downstream tasks
happens before any operator could intervene.
Updated the agent-facing guidance to match: auto-mode accepts +
persists as memory + carries forward; the operator gets the audit
trail via /sf escalate list --all but the executed work stands. This
shifts the agent's incentive toward thorough rationale capture (since
that's what survives) rather than the false comfort of "the user can
fix it later".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After aa60821ec wired the rerank pass, the search header still said
"(embedding-ranked)" even when SF_LLM_GATEWAY_RERANK_MODEL was set
and the worker was online. The user couldn't tell whether they were
seeing cosine-only or rerank-enhanced results.
Now the header has three states:
- "(embedding+rerank-ranked)" — both env vars set
- "(embedding-ranked)" — only SF_LLM_GATEWAY_KEY set
- "(static rank — set SF_LLM_GATEWAY_KEY for embeddings)" — neither
Header-only diff. The rerank can still soft-degrade silently if the
worker is offline (caller throttles the warning to once/min) — header
reports the configured state, not the realized state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new tests covering the embedding-cleanup paths shipped in
7bec2dc2d / 1b71ddd17 / 05a326a29:
1. updateMemoryContent → drops the existing memory_embeddings row
(next backfill re-embeds the new content).
2. supersedeMemory → drops the superseded memory's embedding while
preserving the live one's.
3. enforceMemoryCap → sweeps embeddings of newly-superseded memories
so memory_embeddings stays aligned with active memories after a
batch cap.
Without these, a regression in the cleanup paths would silently leave
orphaned vectors that loadAllEmbeddings's superseded_by filter masks
at query time but bloats the table forever.
11 → 14 tests in memory-store.test.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 00c13bc5a added "createMemory on resolveEscalation" but the
behavior was untested — a regression that broke it would silently
disable the cross-session learning surface (the [escalation:T##]
memories are what carry agent rationales forward via getRelevantMemories
ranking).
Two new tests:
1. resolveEscalation with explicit user rationale → memory contains
the question, choice, and user rationale, category=architecture.
2. resolveEscalation with empty rationale → falls back to the
artifact's recommendationRationale (the formatEscalationMemoryContent
contract).
23 tests in the file now (was 21).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "already-resolved" branch returned a bare timestamp with no
guidance. Auto-accepted escalations especially leave the user wondering
what to do — the carry-forward was already injected into the next
task, so this command can't retroactively undo the choice.
Now the message distinguishes auto-accepted vs user-resolved and, for
the auto-accepted case, points to `/sf memory note "..."` as the
forward-looking corrective surface (it lands in memory_embeddings on
next backfill and influences future ranking).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When SF_LLM_GATEWAY_RERANK_MODEL is set but no rerank worker is online,
every memory query (per execute-task prompt assembly) would log
"[sf:memory-embeddings] WARN: llm-gateway /rerank unavailable (503)" —
several lines per turn, all redundant. The soft-degrade is expected in
this state.
Now the message logs at most once per 60s. Symmetric with the
runEmbeddingBackfill unavailable-throttle pattern. Both sad-path
loggers stay informative (the operator sees one line and knows the
worker is down) without drowning the journal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runEmbeddingBackfill fires on every agent_end (per-turn). When the
gateway is online and a project produces memories, every turn would
write a "[sf:memory-embeddings] WARN: backfill: embedded N memories"
line — successes labeled as warnings, repeating on every cycle. That
both inflates the stderr stream and misleads grep-for-WARN diagnostics.
Successes are routine; the function's return value carries the count
when a caller cares. Failures still log (throttled to 60s) via the
existing path. Net effect: the embedding pipeline runs silently in the
happy path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same orphan-cleanup as 1b71ddd17 but for the batch path. enforceMemoryCap
calls supersedeLowestRankedMemories, which marks N lowest memories
superseded in one UPDATE — bypassing the per-memory supersede embedding
cleanup. The result was that capping a project at 50 memories left dead
embedding rows for everything that got demoted.
Now: a single DELETE-IN-SUBQUERY removes embedding rows for any memory
that no longer has superseded_by IS NULL — covers both the cap path
and any historical orphans from before the per-row cleanup landed.
Best-effort; cap enforcement is load-bearing, embedding cleanup is not.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
supersedeMemory soft-deleted via superseded_by but left the
memory_embeddings row in place. loadAllEmbeddings already filters
by superseded_by IS NULL, so the orphaned row is harmless functionally
— but it wastes storage, complicates manual SQL audits, and is
inconsistent with updateMemoryContent (which already invalidates the
embedding via 7bec2dc2d).
Best-effort delete; supersede still succeeds even if the embedding
delete raises. Symmetric with the update path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The gateway rerank surface was shipped dormant in 56ee89a94 — the
function existed but no consumer called it, so setting
SF_LLM_GATEWAY_RERANK_MODEL did nothing functional.
Now: after the cosine-rank top-K is computed, optionally call
rerankCandidates(query, top-K) when a rerank model is configured. Re-
sort by relevance_score; gracefully fall back to cosine order in every
sad path (no model, no worker, network error, malformed response).
Strictly additive precision boost — the cosine-only ranking path is
unchanged when rerank isn't enabled OR returns null.
Two new tests: rerank actively reorders the top-K when scores are
returned, and the no-worker-online soft-degrade path preserves cosine
order. 12 tests in the file passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same UX refinement as e104f17ad applied to /sf escalate show <slice>/<task>.
Auto-mode resolutions now display "Auto-accepted <ts> → choice=..." instead
of the generic "Resolved <ts>". The userRationale prefix "auto-mode:"
already disambiguates the source; surfacing the verb makes the show view
match the list view's status semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-mode resolutions stamp the artifact with userRationale prefix
"auto-mode: ..." (set by auto-dispatch.ts when it auto-resolves an
escalation). The list view now shows "auto-accepted (accept)" for
those entries vs "resolved (option-id)" for user-resolved ones, so an
operator scanning `/sf escalate list --all` can tell at a glance which
decisions were autonomous and which had explicit human input.
The artifact JSON is unchanged — this is purely a list-formatter
refinement that surfaces information already recorded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last bare "ADR-011 P2" reference was in the user-facing /sf escalate
help description in commands/catalog.ts. The parallel session's
c481ede33 touched this file (added /sf reload) but left this line
untouched — fixing it now closes the disambiguation sweep across the
entire codebase outside test files.
Comment / string-literal only diff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final pass over the comment-only ambiguity. Every internal "ADR-011"
reference outside test files now reads "gsd-2 ADR-011" so the
source-of-truth lookup is unambiguous (SF's local ADR-011 is "Swarm
Chat and Debate Mode", which has nothing to do with progressive
planning or escalation).
Files: workflow-tool-executors.ts, bootstrap/db-tools.ts,
unit-context-manifest.ts, commands-escalate.ts, sf-db.ts (full sweep,
including remaining function docstrings), tools/plan-milestone.ts,
tools/plan-slice.ts.
Comment-only diff. The one bare "(ADR-011 P2)" left in
commands/catalog.ts:62 (the /sf escalate help text) belongs to the
parallel session's WIP edit on that file — leaving it for them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as df095b406 / f1fc8cc86, applied to the schema-comment
references in sf-db.ts (column comments + migration comments). Future
maintainers reading SQL definitions like:
is_sketch INTEGER NOT NULL DEFAULT 0, -- ADR-011: 1 = slice is a sketch
would otherwise look up SF's local ADR-011 ("Swarm Chat") and find
nothing about sketches. Now reads "gsd-2 ADR-011" so the source-of-
truth is unambiguous.
Comment-only diff. The 5 remaining "(gsd-2)" parenthetical references
already disambiguate clearly enough; left intact to avoid churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as df095b406 but for the user-facing PREFERENCES.md template
that ships in /sf init projects. Reading "ADR-011 P2: mid-execution
escalation" without the gsd-2 prefix sends operators to SF's local
ADR-011 ("Swarm Chat and Debate Mode") which has nothing to do with
escalation.
Markdown-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A future maintainer reading "ADR-011 Phase 2" in escalation.ts would
look up SF's local docs/dev/ADR-011 and find "Swarm Chat and Debate
Mode" — totally unrelated. The escalation + progressive-planning work
ports gsd-2's ADR-011 (Progressive Planning + Escalation), which
happens to share the number with our local ADR-011.
Prefixed every internal comment that referenced the gsd-2 ADR with
"gsd-2 ADR-011" so the source-of-truth lookup is unambiguous. Comment-
only diff — no compilation, runtime, or test surface affected.
Files: types.ts, auto-prompts.ts, auto-dispatch.ts, escalation.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The autonomous-mode footer in refine-slice.md was the short version
("Document assumptions in the plan") while plan-slice / execute-task /
complete-slice all carry the full explanation: agents are in auto-mode,
no human is available, document assumptions in the artifact, note
human-input-required decisions in the relevant artifact and proceed
with the best available option.
Refine-slice gets sketches refined into full plans — same autonomy
contract as plan-slice. Aligning the language so an agent reading any
of these prompts gets the same self-help instructions about
ask_user_questions / secure_env_collect.
Markdown-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These are runtime-only settings (not YAML keys), and the previous template
mentioned only the YAML phase toggles. Operators discovering the
embedding/rerank surface had to read source. Adding a clear table at the
bottom of PREFERENCES.md so the env-var contract is documented next to
the rest of the skill prefs.
Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL,
SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the
silent-fallback semantics and the agent_end backfill cadence.
Markdown-only; no recompile needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auto-session-encapsulation invariant: the parallel session refactored
auto.ts to use the getAutoSession() factory; the test still expected
`new AutoSession()` literally. Updated the regex + the allowedPatterns
list to accept both shapes — the invariant is "exactly one module-level
binding for the AutoSession instance", not which constructor expression
yields it.
silent-catch-diagnostics #3348: auto-supervisor.ts:53 swallowed signal-
handler exceptions silently. Added logWarning("session", ...) — the
intent stays the same (signal handler must not throw), but cleanup-path
errors are now visible in the journal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
updateMemoryContent rewrote the row but left the existing memory_embeddings
vector in place — that vector was computed against the old content, so the
next cosine query would score the memory by what it used to say, not what
it says now.
Now drop the embedding row on update; the next runEmbeddingBackfill
(agent_end hook) re-embeds. Best-effort: a missing embedding is the
silent-fallback case the ranker already handles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema-version assertions hadn't been bumped past 21 in three places
(complete-task/complete-slice/md-importer); manifest coverage tests caught
the project-scoped unit types added for the deep planning gate (ADR-011)
that weren't yet registered in either KNOWN_UNIT_TYPES table; workflow-
templates registry test rejected docs-sync.yaml because the assertion was
.md-only.
- preferences-types.ts: KNOWN_UNIT_TYPES gains refine-slice, discuss-project,
discuss-requirements, research-project, workflow-preferences.
- unit-context-manifest.ts: same five types added to its local
KNOWN_UNIT_TYPES + UNIT_MANIFESTS (TOOLS_PLANNING, scoped/full knowledge,
COMMON_BUDGET_MEDIUM/LARGE).
- complete-task / complete-slice / md-importer test: schema_version
expectation 21 → 25.
- workflow-templates test: file extension can be .md OR .yaml (docs-sync is
intentionally yaml-step iteration).
6 test files / 81 tests now green that were red.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New subcommand: /sf memory search "<query>". Routes through
getRelevantMemoriesRanked, so when SF_LLM_GATEWAY_KEY is set the gateway
embeds the query and ranks memories by cosine + static blend; without
the key, gracefully degrades to static ranking. Header text indicates
which path was taken so users know whether embeddings are live.
This makes the embedding pipeline operator-discoverable — previously the
only consumer was the silent execute-task injection path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit populated memory_embeddings rows but no consumer read
them — the read path (getActiveMemoriesRanked) used pure static score
(confidence × hit_count). Embeddings were silent.
This wires the read side:
- rankMemoriesByEmbedding (pure, in memory-embeddings.ts) blends static
score with cosine similarity: combined = static * (1 + α * cosine).
Defaults α=0.6 — a perfect-static + zero-similarity hit ties roughly
with a low-static + perfect-similarity hit, so semantically relevant
cold memories can surface above stale-but-popular ones.
- embedQueryViaGateway + loadEmbeddingMap — supporting helpers.
- getRelevantMemoriesRanked (memory-store.ts) — async query-aware ranker.
Oversamples the static pool 5×, embeds the query, blends, returns top-K.
Falls back cleanly to static ranking when:
- query empty
- no SF_LLM_GATEWAY_KEY (gateway not configured)
- gateway request fails (500/network)
- no embeddings exist yet (fresh DB / worker offline)
- auto-prompts.ts: execute-task injection now uses sliceTitle + taskTitle
as the query so memories relevant to the current work surface first.
10 new tests lock the contract — pure ranker math, fallback chain, and
the gateway-mocked promotion case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an opt-in embedding path against `https://llm-gateway.centralcloud.com/v1`
using qwen/qwen3-embedding-4b. Activated by exporting SF_LLM_GATEWAY_KEY;
URL/model overridable via SF_LLM_GATEWAY_URL and SF_LLM_GATEWAY_EMBED_MODEL.
Rerank surface present (SF_LLM_GATEWAY_RERANK_MODEL) but degrades to null
when no rerank worker is online — current gateway has none, so it stays
dormant until one comes up.
- memory-embeddings-llm-gateway.ts: createGatewayEmbedFn + rerankCandidates
speaking the OpenAI-shaped /v1/embeddings and /v1/rerank protocols.
- memory-embeddings.ts: listUnembeddedMemoryIds + runEmbeddingBackfill —
best-effort sweep, in-flight-guarded, bounded, throttled "unavailable"
log. Wired into agent_end so every turn opportunistically embeds new
memories when the gateway is reachable.
- sf-db.ts: pre-existing bug fix — memory_embeddings, memory_relations,
and memory_sources were referenced everywhere but never CREATE-d in the
schema. Adding them as IF NOT EXISTS with proper FK + PK so fresh DBs
actually work.
- 16 new tests covering env config, embed fn shape, rerank degradation,
backfill happy/sad/bounded paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveEscalation gains an optional `source: "user" | "auto-mode"`
parameter (default "user"). Auto-dispatch passes "auto-mode" when it
auto-accepts. The UOK audit event type now flips between
"escalation-user-responded" and "escalation-auto-accepted", and the
payload includes a typed `resolvedBy` field.
Why: a journal grep for user actions shouldn't return auto-mode events.
Audit/observability tools can now filter cleanly without string-matching
the rationale prefix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an escalation is resolved (auto-mode accept or user override), write
the choice + rationale into the memories table with category="architecture".
The "[escalation:<task>] <question>. Chose: <option>. Rationale: ..."
prefix mirrors the decisions->memories backfill format so search and
de-duplication work the same way.
Why: getActiveMemoriesRanked auto-injects top memories into every
execute-task prompt, so a resolved escalation now travels forward as
implicit context across the whole project — not just the immediate
carry-forward into the next task. The artifact JSON stays as the audit
trail; the memory is the discoverable, semantically-ranked surface.
Best-effort write — never blocks resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When sf_task_complete's escalation payload was rejected (validation error)
or silently dropped (feature flag off), the agent saw a clean "Completed
task" response and assumed the issue was raised — but no carry-forward
override was created, so the next executor saw nothing.
Now the response text explicitly says:
- "WARNING: escalation payload was REJECTED (<error>); the next executor
will NOT see your decision" — when buildEscalationArtifact throws
- "note: escalation payload was DROPPED because phases.mid_execution_escalation
is disabled" — when feature flag is off
Task completion is still never blocked by escalation issues — additive,
auditable, agent-actionable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The global skill hardcoded `.sf/milestones/M008/bugs/bug-registry.json`
and `M008-specific:` rules — when M008 closes the skill goes stale and
misleads agents on every other milestone.
Reframed as "Milestone Bug Registry Guidance": the rules apply to any
milestone that ships a `bug-registry.json` + `triage-protocol.md` pair,
with M008 cited as the canonical example for the registry test. When no
registry exists, the section is skipped — agents follow the normal
evidence/repro/fix flow.
triage-protocol-registry test (31 tests) still passes — keeps the
literal `bug-registry.json` reference and HIGH/MEDIUM/LOW + cluster +
update-after-fix assertions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The escalation feature was invisible to agents — the prompt didn't say it
existed, so agents made silent assumptions instead of surfacing genuine
tradeoffs. Now, when phases.mid_execution_escalation is on, execute-task
includes a guidance block showing the escalation payload shape and noting
auto-mode auto-accepts the recommendation by default. When the feature is
off the field is silently dropped, so the guidance is omitted entirely to
avoid misleading the agent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto is autonomous, so the escalating-task dispatch rule shouldn't halt
the loop. Default: accept the agent's recommendation, record the choice
with `auto-mode: ...` rationale, and let the next dispatch cycle pick up
the carry-forward override. Users can review or override via
`/sf escalate list --all` later.
Set `phases.escalation_auto_accept: false` to keep gsd-2's pause-and-ask
behavior (loop halts until the user runs `/sf escalate resolve`).
- types.ts: add escalation_auto_accept (default true)
- preferences-validation.ts: allowlist + warn on unknown phase keys
- auto-dispatch.ts: rename rule to "auto-accept-or-pause"; on auto-accept
resolve via resolveEscalation("accept", ...) and return action:"skip"
so the next cycle re-reads state cleanly
- PREFERENCES.md: surface the toggle with the autonomy rationale
- tests/escalation-auto-accept.test.ts: 4 cases — default accept, explicit
true, explicit false (preserves pause), non-escalating phase no-op
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small DB helpers from gsd-2 that SF was missing, plus a UX
improvement to /sf escalate list that uses one of them.
PDD spec:
setSliceSketchFlag(milestoneId, sliceId, isSketch) — generalized
sketch-flag setter. Replaces my narrower clearSliceSketch (which
remains as a thin wrapper for callers that only zero). Use this
when a re-plan flow wants to revert a slice back to sketch state.
autoHealSketchFlags(milestoneId, hasPlanFile) — safety net for
progressive planning. Predicate-based: caller passes a function
that resolves whether a PLAN file exists for a slice, function
flips is_sketch=0 for any slice that has both is_sketch=1 AND a
plan file. Catches DB-FS drift after crashes/manual edits.
listEscalationArtifacts(milestoneId, includeResolved=false) —
cross-slice DB-side filter for /sf escalate list. Replaces my
hand-rolled inner-loop over getMilestoneSlices() + getSliceTasks()
+ filter — single SQL query, sorted by sequence, faster.
UX improvement to commands-escalate.ts:
- /sf escalate list: now uses listEscalationArtifacts; shows
PENDING / awaiting-review / resolved status badges per entry.
- /sf escalate list --all: includes resolved entries (audit trail).
- Better hint message when none active: 'Use --all to include
resolved'.
Verified:
- typecheck clean (one parallel-session-introduced error in
self-feedback-drain.ts is unrelated — they import a missing
utils/error.ts; will land when their commit does).
- escalation-feature.test.ts (21 tests) + sf-db.test.ts (16
tests) still pass — no regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>