Commit graph

3960 commits

Author SHA1 Message Date
Mikael Hugo
e2e708fc11 test(sf): lock continueWithDefault memory persistence contract
Two new tests covering the symmetric write shipped in 7a5b12540:

1. writeEscalationArtifact with continueWithDefault=true → memory
   created with "[escalation:T##]" prefix, "auto-applied default:"
   rationale marker, and Fail option label (the recommendation).
2. writeEscalationArtifact with continueWithDefault=false → NO memory
   at write time (pending entries defer persistence to resolveEscalation
   per existing behavior).

Together with the resolve-time tests in 3b5e6588e, all three
escalation flows (resolved, auto-accepted, default-applied) have
locked memory-persistence contracts. 23 → 25 tests in the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:47:08 +02:00
Mikael Hugo
7a5b125405 feat(sf): persist continueWithDefault escalations as memories too
When an agent escalates with continueWithDefault=true, it has already
proceeded with the recommendation — the artifact JSON captures the
audit trail but no other surface carries the rationale forward.
Downstream tasks running after this one would query memories and find
nothing about the choice.

resolveEscalation already writes a memory on the continueWithDefault=
false path (after operator resolves). This is the symmetric write for
the continueWithDefault=true path: same category="architecture",
same "[escalation:T##]" prefix, with the rationale prefixed
"auto-applied default: ..." so a journal scan can tell apart
continueWithDefault entries from operator-resolved ones.

Now a slice's full decision history (operator-resolved + auto-accepted
+ default-applied escalations) lives uniformly in the memory store and
flows into the cosine ranking for downstream prompts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:46:07 +02:00
Mikael Hugo
fec6c293bf docs(sf): align agent escalation guidance with already-resolved reality
The execute-task escalation guidance claimed the user "can review or
override later via /sf escalate". Commit c1ce9aac1 already made the
already-resolved message explicit that auto-accepted decisions can't
be retroactively undone — the carry-forward into downstream tasks
happens before any operator could intervene.

Updated the agent-facing guidance to match: auto-mode accepts +
persists as memory + carries forward; the operator gets the audit
trail via /sf escalate list --all but the executed work stands. This
shifts the agent's incentive toward thorough rationale capture (since
that's what survives) rather than the false comfort of "the user can
fix it later".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:43:18 +02:00
Mikael Hugo
5cc2522646 feat(sf): /sf memory search header reports rerank state too
After aa60821ec wired the rerank pass, the search header still said
"(embedding-ranked)" even when SF_LLM_GATEWAY_RERANK_MODEL was set
and the worker was online. The user couldn't tell whether they were
seeing cosine-only or rerank-enhanced results.

Now the header has three states:
- "(embedding+rerank-ranked)" — both env vars set
- "(embedding-ranked)" — only SF_LLM_GATEWAY_KEY set
- "(static rank — set SF_LLM_GATEWAY_KEY for embeddings)" — neither

Header-only diff. The rerank can still soft-degrade silently if the
worker is offline (caller throttles the warning to once/min) — header
reports the configured state, not the realized state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:39:28 +02:00
Mikael Hugo
54f27bd02c test(sf): lock embedding lifecycle hygiene contract
Three new tests covering the embedding-cleanup paths shipped in
7bec2dc2d / 1b71ddd17 / 05a326a29:

1. updateMemoryContent → drops the existing memory_embeddings row
   (next backfill re-embeds the new content).
2. supersedeMemory → drops the superseded memory's embedding while
   preserving the live one's.
3. enforceMemoryCap → sweeps embeddings of newly-superseded memories
   so memory_embeddings stays aligned with active memories after a
   batch cap.

Without these, a regression in the cleanup paths would silently leave
orphaned vectors that loadAllEmbeddings's superseded_by filter masks
at query time but bloats the table forever.

11 → 14 tests in memory-store.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:35:15 +02:00
Mikael Hugo
3b5e6588e9 test(sf): lock escalation→memory persistence contract
Commit 00c13bc5a added "createMemory on resolveEscalation" but the
behavior was untested — a regression that broke it would silently
disable the cross-session learning surface (the [escalation:T##]
memories are what carry agent rationales forward via getRelevantMemories
ranking).

Two new tests:
1. resolveEscalation with explicit user rationale → memory contains
   the question, choice, and user rationale, category=architecture.
2. resolveEscalation with empty rationale → falls back to the
   artifact's recommendationRationale (the formatEscalationMemoryContent
   contract).

23 tests in the file now (was 21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:33:18 +02:00
Mikael Hugo
c1ce9aac15 docs(sf): better message when /sf escalate resolve hits an already-resolved entry
The "already-resolved" branch returned a bare timestamp with no
guidance. Auto-accepted escalations especially leave the user wondering
what to do — the carry-forward was already injected into the next
task, so this command can't retroactively undo the choice.

Now the message distinguishes auto-accepted vs user-resolved and, for
the auto-accepted case, points to `/sf memory note "..."` as the
forward-looking corrective surface (it lands in memory_embeddings on
next backfill and influences future ranking).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:32:01 +02:00
Mikael Hugo
daa192a572 docs: list memory-* modules in architecture.md
The repo's architecture file listed only `memory-extractor.ts` and
`memory-store.ts` — the rest of the memory subsystem
(`memory-embeddings.ts`, `memory-embeddings-llm-gateway.ts`,
`memory-relations.ts`, `memory-source-store.ts`) had no entry, so a
new contributor reading the file would miss them entirely.

Added one-line descriptions for each, including the gateway adapter's
opt-in env-var contract (`SF_LLM_GATEWAY_KEY`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:29:03 +02:00
Mikael Hugo
5fda99bfae chore(sf): throttle rerank-unavailable warnings to once per minute
When SF_LLM_GATEWAY_RERANK_MODEL is set but no rerank worker is online,
every memory query (per execute-task prompt assembly) would log
"[sf:memory-embeddings] WARN: llm-gateway /rerank unavailable (503)" —
several lines per turn, all redundant. The soft-degrade is expected in
this state.

Now the message logs at most once per 60s. Symmetric with the
runEmbeddingBackfill unavailable-throttle pattern. Both sad-path
loggers stay informative (the operator sees one line and knows the
worker is down) without drowning the journal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:27:57 +02:00
Mikael Hugo
0ee94f21be chore(sf): drop chatty backfill success log
runEmbeddingBackfill fires on every agent_end (per-turn). When the
gateway is online and a project produces memories, every turn would
write a "[sf:memory-embeddings] WARN: backfill: embedded N memories"
line — successes labeled as warnings, repeating on every cycle. That
both inflates the stderr stream and misleads grep-for-WARN diagnostics.

Successes are routine; the function's return value carries the count
when a caller cares. Failures still log (throttled to 60s) via the
existing path. Net effect: the embedding pipeline runs silently in the
happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:25:35 +02:00
Mikael Hugo
05a326a294 fix(sf): enforceMemoryCap sweeps orphaned embeddings too
Same orphan-cleanup as 1b71ddd17 but for the batch path. enforceMemoryCap
calls supersedeLowestRankedMemories, which marks N lowest memories
superseded in one UPDATE — bypassing the per-memory supersede embedding
cleanup. The result was that capping a project at 50 memories left dead
embedding rows for everything that got demoted.

Now: a single DELETE-IN-SUBQUERY removes embedding rows for any memory
that no longer has superseded_by IS NULL — covers both the cap path
and any historical orphans from before the per-row cleanup landed.
Best-effort; cap enforcement is load-bearing, embedding cleanup is not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:23:37 +02:00
Mikael Hugo
1b71ddd178 fix(sf): drop embedding row when memory is superseded
supersedeMemory soft-deleted via superseded_by but left the
memory_embeddings row in place. loadAllEmbeddings already filters
by superseded_by IS NULL, so the orphaned row is harmless functionally
— but it wastes storage, complicates manual SQL audits, and is
inconsistent with updateMemoryContent (which already invalidates the
embedding via 7bec2dc2d).

Best-effort delete; supersede still succeeds even if the embedding
delete raises. Symmetric with the update path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:21:57 +02:00
Mikael Hugo
aa60821ec3 feat(sf): wire rerank pass into getRelevantMemoriesRanked
The gateway rerank surface was shipped dormant in 56ee89a94 — the
function existed but no consumer called it, so setting
SF_LLM_GATEWAY_RERANK_MODEL did nothing functional.

Now: after the cosine-rank top-K is computed, optionally call
rerankCandidates(query, top-K) when a rerank model is configured. Re-
sort by relevance_score; gracefully fall back to cosine order in every
sad path (no model, no worker, network error, malformed response).

Strictly additive precision boost — the cosine-only ranking path is
unchanged when rerank isn't enabled OR returns null.

Two new tests: rerank actively reorders the top-K when scores are
returned, and the no-worker-online soft-degrade path preserves cosine
order. 12 tests in the file passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:20:29 +02:00
Mikael Hugo
083a7d5eb6 feat(sf): /sf escalate show also distinguishes auto-accepted
Same UX refinement as e104f17ad applied to /sf escalate show <slice>/<task>.
Auto-mode resolutions now display "Auto-accepted <ts> → choice=..." instead
of the generic "Resolved <ts>". The userRationale prefix "auto-mode:"
already disambiguates the source; surfacing the verb makes the show view
match the list view's status semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:16:41 +02:00
Mikael Hugo
e104f17ad2 feat(sf): /sf escalate list distinguishes auto-accepted from user-resolved
Auto-mode resolutions stamp the artifact with userRationale prefix
"auto-mode: ..." (set by auto-dispatch.ts when it auto-resolves an
escalation). The list view now shows "auto-accepted (accept)" for
those entries vs "resolved (option-id)" for user-resolved ones, so an
operator scanning `/sf escalate list --all` can tell at a glance which
decisions were autonomous and which had explicit human input.

The artifact JSON is unchanged — this is purely a list-formatter
refinement that surfaces information already recorded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:15:20 +02:00
Mikael Hugo
4fb3476912 docs(sf): final ADR-011 leak — /sf escalate help text
Last bare "ADR-011 P2" reference was in the user-facing /sf escalate
help description in commands/catalog.ts. The parallel session's
c481ede33 touched this file (added /sf reload) but left this line
untouched — fixing it now closes the disambiguation sweep across the
entire codebase outside test files.

Comment / string-literal only diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:13:11 +02:00
Mikael Hugo
c481ede338 fix(sf): supervise dev reload path 2026-05-02 23:11:20 +02:00
Mikael Hugo
ef82fbf2c6 docs(sf): finish ADR-011 disambiguation across remaining .ts files
Final pass over the comment-only ambiguity. Every internal "ADR-011"
reference outside test files now reads "gsd-2 ADR-011" so the
source-of-truth lookup is unambiguous (SF's local ADR-011 is "Swarm
Chat and Debate Mode", which has nothing to do with progressive
planning or escalation).

Files: workflow-tool-executors.ts, bootstrap/db-tools.ts,
unit-context-manifest.ts, commands-escalate.ts, sf-db.ts (full sweep,
including remaining function docstrings), tools/plan-milestone.ts,
tools/plan-slice.ts.

Comment-only diff. The one bare "(ADR-011 P2)" left in
commands/catalog.ts:62 (the /sf escalate help text) belongs to the
parallel session's WIP edit on that file — leaving it for them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:11:16 +02:00
Mikael Hugo
f5dabf1857 docs(sf): disambiguate ADR-011 in sf-db.ts schema comments too
Same fix as df095b406 / f1fc8cc86, applied to the schema-comment
references in sf-db.ts (column comments + migration comments). Future
maintainers reading SQL definitions like:

  is_sketch INTEGER NOT NULL DEFAULT 0, -- ADR-011: 1 = slice is a sketch

would otherwise look up SF's local ADR-011 ("Swarm Chat") and find
nothing about sketches. Now reads "gsd-2 ADR-011" so the source-of-
truth is unambiguous.

Comment-only diff. The 5 remaining "(gsd-2)" parenthetical references
already disambiguate clearly enough; left intact to avoid churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:09:34 +02:00
Mikael Hugo
f1fc8cc86b docs(sf): disambiguate ADR-011 in PREFERENCES.md template too
Same fix as df095b406 but for the user-facing PREFERENCES.md template
that ships in /sf init projects. Reading "ADR-011 P2: mid-execution
escalation" without the gsd-2 prefix sends operators to SF's local
ADR-011 ("Swarm Chat and Debate Mode") which has nothing to do with
escalation.

Markdown-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:07:13 +02:00
Mikael Hugo
df095b406a docs(sf): disambiguate "ADR-011" — comments now say "gsd-2 ADR-011"
A future maintainer reading "ADR-011 Phase 2" in escalation.ts would
look up SF's local docs/dev/ADR-011 and find "Swarm Chat and Debate
Mode" — totally unrelated. The escalation + progressive-planning work
ports gsd-2's ADR-011 (Progressive Planning + Escalation), which
happens to share the number with our local ADR-011.

Prefixed every internal comment that referenced the gsd-2 ADR with
"gsd-2 ADR-011" so the source-of-truth lookup is unambiguous. Comment-
only diff — no compilation, runtime, or test surface affected.

Files: types.ts, auto-prompts.ts, auto-dispatch.ts, escalation.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:06:34 +02:00
Mikael Hugo
b6bdbe586a docs(sf): align refine-slice "Autonomous execution" footer with siblings
The autonomous-mode footer in refine-slice.md was the short version
("Document assumptions in the plan") while plan-slice / execute-task /
complete-slice all carry the full explanation: agents are in auto-mode,
no human is available, document assumptions in the artifact, note
human-input-required decisions in the relevant artifact and proceed
with the best available option.

Refine-slice gets sketches refined into full plans — same autonomy
contract as plan-slice. Aligning the language so an agent reading any
of these prompts gets the same self-help instructions about
ask_user_questions / secure_env_collect.

Markdown-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:01:44 +02:00
Mikael Hugo
16cf479781 docs(sf): surface SF_LLM_GATEWAY_* env vars in PREFERENCES template
These are runtime-only settings (not YAML keys), and the previous template
mentioned only the YAML phase toggles. Operators discovering the
embedding/rerank surface had to read source. Adding a clear table at the
bottom of PREFERENCES.md so the env-var contract is documented next to
the rest of the skill prefs.

Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL,
SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the
silent-fallback semantics and the agent_end backfill cadence.

Markdown-only; no recompile needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 23:00:15 +02:00
Mikael Hugo
8299c7ac2b fix(sf): clear last 2 stale failures from gsd-2 compat sweep
auto-session-encapsulation invariant: the parallel session refactored
auto.ts to use the getAutoSession() factory; the test still expected
`new AutoSession()` literally. Updated the regex + the allowedPatterns
list to accept both shapes — the invariant is "exactly one module-level
binding for the AutoSession instance", not which constructor expression
yields it.

silent-catch-diagnostics #3348: auto-supervisor.ts:53 swallowed signal-
handler exceptions silently. Added logWarning("session", ...) — the
intent stays the same (signal handler must not throw), but cleanup-path
errors are now visible in the journal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:51:42 +02:00
Mikael Hugo
3e8c5b192f fix(sf): add sf-dev batch server command 2026-05-02 22:44:14 +02:00
Mikael Hugo
c9609459e4 fix(daemon): --verbose actually lowers log level + reports effective level
--verbose was wired only to the stderr-mirror path. Debug entries got
filtered by Logger.level (default 'info' from config) before reaching
the mirror — so passing --verbose produced almost no extra output, which
made it look broken on a fresh start.

Now --verbose lowers the level to 'debug' AND mirrors. Logger exposes
`effectiveLevel` so the "daemon started" banner reports what the logger
is actually using, not what was in the config file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:41:48 +02:00
Mikael Hugo
7bec2dc2d0 fix(sf): invalidate stale embedding when memory content is updated
updateMemoryContent rewrote the row but left the existing memory_embeddings
vector in place — that vector was computed against the old content, so the
next cosine query would score the memory by what it used to say, not what
it says now.

Now drop the embedding row on update; the next runEmbeddingBackfill
(agent_end hook) re-embeds. Best-effort: a missing embedding is the
silent-fallback case the ranker already handles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:38:24 +02:00
Mikael Hugo
a3c000de26 fix(sf): close 6 stale test failures from gsd-2 compat sweep
Schema-version assertions hadn't been bumped past 21 in three places
(complete-task/complete-slice/md-importer); manifest coverage tests caught
the project-scoped unit types added for the deep planning gate (ADR-011)
that weren't yet registered in either KNOWN_UNIT_TYPES table; workflow-
templates registry test rejected docs-sync.yaml because the assertion was
.md-only.

- preferences-types.ts: KNOWN_UNIT_TYPES gains refine-slice, discuss-project,
  discuss-requirements, research-project, workflow-preferences.
- unit-context-manifest.ts: same five types added to its local
  KNOWN_UNIT_TYPES + UNIT_MANIFESTS (TOOLS_PLANNING, scoped/full knowledge,
  COMMON_BUDGET_MEDIUM/LARGE).
- complete-task / complete-slice / md-importer test: schema_version
  expectation 21 → 25.
- workflow-templates test: file extension can be .md OR .yaml (docs-sync is
  intentionally yaml-step iteration).

6 test files / 81 tests now green that were red.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:35:26 +02:00
Mikael Hugo
3f213f3131 fix(sf): run sf-server from source in dev 2026-05-02 22:34:42 +02:00
Mikael Hugo
974d8e4b6d fix(sf): expose daemon as sf-server 2026-05-02 22:25:24 +02:00
Mikael Hugo
e5787794f3 feat(sf): /sf memory search — embedding-ranked memory query
New subcommand: /sf memory search "<query>". Routes through
getRelevantMemoriesRanked, so when SF_LLM_GATEWAY_KEY is set the gateway
embeds the query and ranks memories by cosine + static blend; without
the key, gracefully degrades to static ranking. Header text indicates
which path was taken so users know whether embeddings are live.

This makes the embedding pipeline operator-discoverable — previously the
only consumer was the silent execute-task injection path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:22:33 +02:00
Mikael Hugo
eb5f7ef7b6 feat(sf): query-aware memory ranking — embeddings now actually matter
Previous commit populated memory_embeddings rows but no consumer read
them — the read path (getActiveMemoriesRanked) used pure static score
(confidence × hit_count). Embeddings were silent.

This wires the read side:
- rankMemoriesByEmbedding (pure, in memory-embeddings.ts) blends static
  score with cosine similarity: combined = static * (1 + α * cosine).
  Defaults α=0.6 — a perfect-static + zero-similarity hit ties roughly
  with a low-static + perfect-similarity hit, so semantically relevant
  cold memories can surface above stale-but-popular ones.
- embedQueryViaGateway + loadEmbeddingMap — supporting helpers.
- getRelevantMemoriesRanked (memory-store.ts) — async query-aware ranker.
  Oversamples the static pool 5×, embeds the query, blends, returns top-K.
  Falls back cleanly to static ranking when:
    - query empty
    - no SF_LLM_GATEWAY_KEY (gateway not configured)
    - gateway request fails (500/network)
    - no embeddings exist yet (fresh DB / worker offline)
- auto-prompts.ts: execute-task injection now uses sliceTitle + taskTitle
  as the query so memories relevant to the current work surface first.

10 new tests lock the contract — pure ranker math, fallback chain, and
the gateway-mocked promotion case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:18:45 +02:00
Mikael Hugo
56ee89a946 feat(sf): live embeddings via inference-fabric llm-gateway + auto-backfill
Adds an opt-in embedding path against `https://llm-gateway.centralcloud.com/v1`
using qwen/qwen3-embedding-4b. Activated by exporting SF_LLM_GATEWAY_KEY;
URL/model overridable via SF_LLM_GATEWAY_URL and SF_LLM_GATEWAY_EMBED_MODEL.
Rerank surface present (SF_LLM_GATEWAY_RERANK_MODEL) but degrades to null
when no rerank worker is online — current gateway has none, so it stays
dormant until one comes up.

- memory-embeddings-llm-gateway.ts: createGatewayEmbedFn + rerankCandidates
  speaking the OpenAI-shaped /v1/embeddings and /v1/rerank protocols.
- memory-embeddings.ts: listUnembeddedMemoryIds + runEmbeddingBackfill —
  best-effort sweep, in-flight-guarded, bounded, throttled "unavailable"
  log. Wired into agent_end so every turn opportunistically embeds new
  memories when the gateway is reachable.
- sf-db.ts: pre-existing bug fix — memory_embeddings, memory_relations,
  and memory_sources were referenced everywhere but never CREATE-d in the
  schema. Adding them as IF NOT EXISTS with proper FK + PK so fresh DBs
  actually work.
- 16 new tests covering env config, embed fn shape, rerank degradation,
  backfill happy/sad/bounded paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:13:23 +02:00
Mikael Hugo
dd126ddc8b fix(sf): recover model routes and self-feedback 2026-05-02 22:07:10 +02:00
Mikael Hugo
c308a492d7 chore(sf): differentiate auto-accepted vs user-resolved escalations in audit
resolveEscalation gains an optional `source: "user" | "auto-mode"`
parameter (default "user"). Auto-dispatch passes "auto-mode" when it
auto-accepts. The UOK audit event type now flips between
"escalation-user-responded" and "escalation-auto-accepted", and the
payload includes a typed `resolvedBy` field.

Why: a journal grep for user actions shouldn't return auto-mode events.
Audit/observability tools can now filter cleanly without string-matching
the rationale prefix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:59:38 +02:00
Mikael Hugo
00c13bc5a1 feat(sf): persist escalation resolutions as durable memories
When an escalation is resolved (auto-mode accept or user override), write
the choice + rationale into the memories table with category="architecture".
The "[escalation:<task>] <question>. Chose: <option>. Rationale: ..."
prefix mirrors the decisions->memories backfill format so search and
de-duplication work the same way.

Why: getActiveMemoriesRanked auto-injects top memories into every
execute-task prompt, so a resolved escalation now travels forward as
implicit context across the whole project — not just the immediate
carry-forward into the next task. The artifact JSON stays as the audit
trail; the memory is the discoverable, semantically-ranked surface.

Best-effort write — never blocks resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:53:56 +02:00
Mikael Hugo
7c6140517e fix(sf): surface escalation write failures back to the agent
When sf_task_complete's escalation payload was rejected (validation error)
or silently dropped (feature flag off), the agent saw a clean "Completed
task" response and assumed the issue was raised — but no carry-forward
override was created, so the next executor saw nothing.

Now the response text explicitly says:
- "WARNING: escalation payload was REJECTED (<error>); the next executor
  will NOT see your decision" — when buildEscalationArtifact throws
- "note: escalation payload was DROPPED because phases.mid_execution_escalation
  is disabled" — when feature flag is off

Task completion is still never blocked by escalation issues — additive,
auditable, agent-actionable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:48:35 +02:00
Mikael Hugo
b79ebbf10a fix(sf): generalize M008 leak in systematic-debugging skill
The global skill hardcoded `.sf/milestones/M008/bugs/bug-registry.json`
and `M008-specific:` rules — when M008 closes the skill goes stale and
misleads agents on every other milestone.

Reframed as "Milestone Bug Registry Guidance": the rules apply to any
milestone that ships a `bug-registry.json` + `triage-protocol.md` pair,
with M008 cited as the canonical example for the registry test. When no
registry exists, the section is skipped — agents follow the normal
evidence/repro/fix flow.

triage-protocol-registry test (31 tests) still passes — keeps the
literal `bug-registry.json` reference and HIGH/MEDIUM/LOW + cluster +
update-after-fix assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:44:08 +02:00
Mikael Hugo
08859624f8 feat(sf): teach executor about the escalation field on sf_task_complete
The escalation feature was invisible to agents — the prompt didn't say it
existed, so agents made silent assumptions instead of surfacing genuine
tradeoffs. Now, when phases.mid_execution_escalation is on, execute-task
includes a guidance block showing the escalation payload shape and noting
auto-mode auto-accepts the recommendation by default. When the feature is
off the field is silently dropped, so the guidance is omitted entirely to
avoid misleading the agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:41:38 +02:00
Mikael Hugo
3895ae2cd3 feat(sf): auto-mode is autonomous — escalations auto-accept by default
Auto is autonomous, so the escalating-task dispatch rule shouldn't halt
the loop. Default: accept the agent's recommendation, record the choice
with `auto-mode: ...` rationale, and let the next dispatch cycle pick up
the carry-forward override. Users can review or override via
`/sf escalate list --all` later.

Set `phases.escalation_auto_accept: false` to keep gsd-2's pause-and-ask
behavior (loop halts until the user runs `/sf escalate resolve`).

- types.ts: add escalation_auto_accept (default true)
- preferences-validation.ts: allowlist + warn on unknown phase keys
- auto-dispatch.ts: rename rule to "auto-accept-or-pause"; on auto-accept
  resolve via resolveEscalation("accept", ...) and return action:"skip"
  so the next cycle re-reads state cleanly
- PREFERENCES.md: surface the toggle with the autonomy rationale
- tests/escalation-auto-accept.test.ts: 4 cases — default accept, explicit
  true, explicit false (preserves pause), non-escalating phase no-op

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:36:15 +02:00
Mikael Hugo
0f0aee5bf0 feat(sf): port 3 gsd-2 DB helpers + improve /sf escalate list
Three small DB helpers from gsd-2 that SF was missing, plus a UX
improvement to /sf escalate list that uses one of them.

PDD spec:

setSliceSketchFlag(milestoneId, sliceId, isSketch) — generalized
  sketch-flag setter. Replaces my narrower clearSliceSketch (which
  remains as a thin wrapper for callers that only zero). Use this
  when a re-plan flow wants to revert a slice back to sketch state.

autoHealSketchFlags(milestoneId, hasPlanFile) — safety net for
  progressive planning. Predicate-based: caller passes a function
  that resolves whether a PLAN file exists for a slice, function
  flips is_sketch=0 for any slice that has both is_sketch=1 AND a
  plan file. Catches DB-FS drift after crashes/manual edits.

listEscalationArtifacts(milestoneId, includeResolved=false) —
  cross-slice DB-side filter for /sf escalate list. Replaces my
  hand-rolled inner-loop over getMilestoneSlices() + getSliceTasks()
  + filter — single SQL query, sorted by sequence, faster.

UX improvement to commands-escalate.ts:
  - /sf escalate list: now uses listEscalationArtifacts; shows
    PENDING / awaiting-review / resolved status badges per entry.
  - /sf escalate list --all: includes resolved entries (audit trail).
  - Better hint message when none active: 'Use --all to include
    resolved'.

Verified:
  - typecheck clean (one parallel-session-introduced error in
    self-feedback-drain.ts is unrelated — they import a missing
    utils/error.ts; will land when their commit does).
  - escalation-feature.test.ts (21 tests) + sf-db.test.ts (16
    tests) still pass — no regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:22:02 +02:00
Mikael Hugo
82633b6f5e feat(sf): sf-audit-traces workflow for slow self-improvement loop
A standalone agent prompt that reads SF's observability sources
(self-feedback / journal / activity / judgments / forensics) and
files AT MOST 3 recurring-pattern findings via sf_self_report so
they enter the existing triage flow.

PDD spec:

Purpose: continuous self-improvement loop. SF already has the data
  sources (self-feedback.jsonl, journal/, activity/, judgments/) and
  the consumer pattern (triage-self-feedback → requirement-promoter).
  What was missing: a standalone prompt that pulls those sources
  together for a scheduled run.
Consumer: agents invoked via '/schedule every morning sf-audit-traces'
  (cloud) or '/sf workflow run sf-audit-traces' (manual).
Contract:
  1. Snapshot the trace volumes (file counts + line counts) into
     evidence so reports are concrete, not prose.
  2. Bar = 3+ occurrences. Single events go to operator eyeballs,
     not permanent self-feedback entries.
  3. Hard cap of 3 entries per run. The whole point is slow
     iteration — the triage queue is human-paced, not a firehose.
  4. NEVER auto-apply. Even if the fix looks one-line obvious, file
     and stop. The triage flow decides what becomes work.
  5. Zero findings is a successful run when the system is healthy.
Failure boundary: missing source files → skip silently. Read errors
  → handle gracefully. Never block on absence.
Evidence (verified during scan before writing):
  - 181 self-feedback entries (55 open, 126 resolved)
  - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure
    (2), context-injection-gap (2), orphan-prompt (2)
  - Journal: 6-233 events per active day
  - Activity logs: per-unit JSONL transcripts present
  - All sources accessible via plain file reads — no special tools.
Non-goals:
  - ML training on traces
  - Cross-project trace aggregation
  - Auto-applying fixes (triage flow already does that)
  - Fast iteration (deliberately slow — 3/run cap means at most 21
    new triage items per week even with daily runs)
Invariants:
  - Safety: agent never edits code/prompts/templates/docs.
  - Liveness: zero findings is a valid output. The agent doesn't
    fabricate patterns to justify a run.

Discovery verified: 28 total workflow templates after this commit
(was 27); plugins.get('sf-audit-traces') returns the plugin from
the bundled source.

Pairs with: triage-self-feedback (reads what this files),
requirement-promoter (auto-promotes recurring kinds to requirements),
self-feedback-drain (session-start drain into repair turns). The
audit is the IN end of that pipeline; the rest of SF was already
the OUT end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:15:13 +02:00
Mikael Hugo
e381e3c8ad fix(sf): bump SCHEMA_VERSION to 25 + update sf-db.test.ts assertion
The migrate gate `if (currentVersion >= SCHEMA_VERSION) return;` was
short-circuiting at 23, leaving the v24 (escalation_awaiting_review)
and v25 (escalation_override_applied) migrations unreached on fresh
databases. Test caught it: 'fresh DB schema init (memory)' expected
MAX(version)=23 then expected 25 after my test bump, both kept
returning 23 because the migrate function bailed before the new
ensureColumn calls.

Two-line fix:
- sf-db.ts:133  SCHEMA_VERSION 23 → 25
- sf-db.test.ts:88 + :222  expected version 23 → 25

Now fresh DBs run all migrations through v25 and end at the latest
version. Existing databases with version 24 still get v25 applied
because currentVersion < SCHEMA_VERSION (24 < 25).

37/37 tests pass (sf-db + escalation-feature suites). No regression
in the broader 127-test smoke suite that ran before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:05:06 +02:00
Mikael Hugo
aa67c1453c test(sf): full lifecycle coverage for ADR-011 P2 escalation feature
21 vitest tests covering the entire escalation chain shipped this
session. Each contract claim from prior PDD specs gets at least one
verifying test:

buildEscalationArtifact validation (4)
  - option count outside [2,4] → throws
  - duplicate option ids → throws
  - recommendation referencing unknown id → throws
  - happy path → version=1, taskId set, ISO createdAt

writeEscalationArtifact + DB flag flips (3)
  - continueWithDefault=false → escalation_pending=1
  - continueWithDefault=true → escalation_awaiting_review=1
  - two writes flip the pair atomically (mutually exclusive)

detectPendingEscalation (4)
  - empty slice → null
  - paused task → returns task id
  - awaiting_review tasks DO NOT pause
  - resolved (respondedAt set) tasks DO NOT pause

resolveEscalation (5)
  - 'accept' selects recommendation
  - explicit option id resolves with userRationale persisted
  - invalid choice → status=invalid-choice with valid list
  - re-resolve → already-resolved
  - unknown task → not-found

claimOverrideForInjection carry-forward (5)
  - no escalation → null
  - pending (unresolved) → null
  - resolved → returns block + sourceTaskId + sets DB flag=1
  - second claim → null (race-safe idempotent)
  - clearTaskEscalationFlags preserves artifact path (audit trail)

Provides regression protection for the full producer→consumer→
resolution→carry-forward path. All 21 pass against current head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:56:12 +02:00
Mikael Hugo
125496ce36 docs(sf): surface ADR-011 toggles in PREFERENCES.md template
Three new options got wired this session but the bundled template
didn't mention them, so users had no discoverable way to know they
existed. Adds them as commented hint fields:

- phases.progressive_planning — sketch→refine slice planning
- phases.mid_execution_escalation — task agents can pause for user
  decision via sf_task_complete escalation payload + /sf escalate
- planning_depth (top-level) — 'deep' enables project-level
  discussion gate before any milestone work

All three default off (commented out / unset) so existing users see
zero behavior change from this template update; enabling any of them
is a single uncomment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:53:40 +02:00
Mikael Hugo
4b6eb86b84 feat(sf): carry-forward injection — final piece of escalation feature (PDD)
Replaces the claimOverrideForInjection stub with a real race-safe
implementation. With this commit, the full escalation loop is wired:
agent escalates → user pauses → user resolves → next executor in the
slice sees the user's choice as a hard constraint in its prompt.

The buildExecuteTaskPrompt call site at auto-prompts.ts:2452-2469
already invoked claimOverrideForInjection (gated on
phases.mid_execution_escalation). Before this commit it was a no-op
because the function returned null unconditionally. Now it actually
delivers the override block.

PDD spec for this change:

Purpose: complete the loop. Without carry-forward, the loop 'continues'
  but the next executor re-encounters the same ambiguity that
  triggered the escalation.
Consumer: buildExecuteTaskPrompt in auto-prompts.ts (already wired).
Contract:
  1. No resolved-but-unapplied override in this slice → returns null.
     Existing behavior preserved when no escalation pending. Verified.
  2. Pending escalation (no respondedAt) → returns null. Caller's
     pause-detection layer handles those. Verified.
  3. Resolved escalation (respondedAt + userChoice set) →
     atomically marks escalation_override_applied=1 (race-safe via
     UPDATE … WHERE applied=0) and returns formatted markdown block
     with sourceTaskId. Verified.
  4. Second claim on the same override → null (race loser or
     already-applied). Verified.
  5. Missing/malformed artifact → logWarning + null without claiming
     (so the row isn't silently swallowed by an applied=1 flip).
Failure boundary:
  - claimEscalationOverride is the atomic boundary. Either you claim
    it and it's yours forever, or someone else did and you skip.
  - Validation BEFORE claim — bad artifact never marks the row applied.
  - DB unavailable in claimEscalationOverride → returns false → caller
    treats as race-loser → null. Safe.
Evidence:
  - Smoke test exercises 4 contract conditions:
    no-override → null
    pending-only → null
    resolved-then-claim → returns block + sets DB flag
    second-claim → null (idempotent)
  - Typecheck clean.
  - All 62 existing preferences tests still pass (no regression in
    the related plumbing).
Non-goals:
  - reject-blocker carry-forward (gsd-2 has it; needs blocker_source
    DB column SF doesn't have).
  - Cross-slice override carry-forward (current scope: per-slice).
  - Override-applied audit event (gsd-2 emits one; can add later).
Invariants:
  - Safety: applied flag is set BEFORE the prompt is built — so a
    crash mid-build never re-injects on retry.
  - Liveness: any task in the slice with a resolved override gets
    surfaced in sequence order (lowest sequence first via
    findUnappliedEscalationOverride's ORDER BY).
  - Race-safety: SQL UPDATE … WHERE applied=0 returns changes>0 only
    for the winner. Tested with sequential claims; both winners and
    losers behave correctly.
DB schema: tasks.escalation_override_applied (INTEGER NOT NULL
DEFAULT 0), migration v25.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:51:56 +02:00
Mikael Hugo
2c044f340f feat(sf): auto-fill empty model fallbacks from benchmark picker (PDD)
Closes the gap that left the user's session paused on a quota error
with no fallback to switch to. Before this commit:
  - User pins models.execution: { model: gemini-3-flash-preview }
  - No fallbacks array → resolveModelWithFallbacksForUnit returns
    { primary, fallbacks: [] }
  - agent-end-recovery.ts line 348 checks fallbacks.length > 0 → false
  - Loop pauses on the first rate-limit, even though the user has
    other API-keyed providers available.

After: an empty/missing fallbacks array auto-fills from
resolveAutoBenchmarkPickForUnit (which picks API-keyed candidates
ranked by benchmark scores), excluding the user's pinned primary so
we never get a no-op switch to the same model.

PDD spec:

Purpose: out-of-the-box auto-switch to fallback models when a user
  pins only a primary. Matches user expectation that 'the system
  selects models automatically' when keys are available.
Consumer: agent-end-recovery.ts model-fallback flow on rate-limit.
Contract:
  1. models.<unit>: '<id>' (string, no fallbacks) → primary plus
     auto-filled fallbacks. Unchanged primary, fallbacks excluding
     primary.
  2. models.<unit>: { model: '<id>', fallbacks: ['a', 'b'] } (explicit
     non-empty) → unchanged. User intent respected.
  3. models.<unit>: { model: '<id>' } (object, no fallbacks) → auto-
     fill from benchmark picker.
  4. models.<unit>: { model: '<id>', fallbacks: [] } (explicit empty)
     → auto-fill (treat empty same as missing).
  5. No models config at all → unchanged behavior — full auto-pick.
Failure boundary:
  - resolveAutoBenchmarkPickForUnit returns undefined when no
    API-keyed providers exist → fallbacks stays empty (no candidates
    to switch to anyway).
  - autoBenchmark option still honored — set to false to opt out.
Evidence:
  - Smoke test: pinned 'gemini-3-flash-preview' with empty fallbacks +
    OPENROUTER_API_KEY + GEMINI_API_KEY in env → returns 4 fallbacks
    starting with minimax/MiniMax-M2.7. Primary not in fallbacks.
  - Existing 62 preferences tests + 5 rate-limit-model-fallback tests
    still pass — no regression.
Non-goals:
  - Cross-phase inheritance (planning falls back to execution config).
  - Persisting auto-filled fallbacks to PREFERENCES.md.
  - Mid-tool-call rate-limit recovery (different code path through
    pi-coding-agent's RetryHandler).
Invariants:
  - Safety: explicit non-empty user fallbacks NEVER overwritten —
    line userFallbacks.length > 0 short-circuits before auto-fill.
  - Liveness: empty arrays trigger auto-fill, so callers get a chain
    if any keys are configured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:43:28 +02:00
Mikael Hugo
e4a86ddf6f fix(sf): classify 'exhausted your capacity / quota will reset after Ns' as rate-limit
Real failure caught from a user session: provider returned
'Error: You have exhausted your capacity on this model. Your quota
will reset after 51s.' SF's classifier didn't match it (no 'rate
limit', no '429', no 'limit resets'), so it fell through to unknown
→ no auto-resume → loop paused indefinitely until manual /sf
autonomous restart.

PDD spec:

Purpose: every legitimately transient quota error should auto-resume
  after the named cooldown, not pause indefinitely.
Consumer: classifyError() callers, ultimately the auto-loop.
Contract:
  - 'exhausted your|the (quota|capacity|usage)' → rate-limit
  - 'quota will reset' → rate-limit (paired with the above)
  - 'will reset after Ns' / 'will reset in Ns' → retryAfterMs = N*1000
Failure boundary: parse failure → 60s default (preserved).
Evidence: smoke test with 6 inputs:
   'exhausted your capacity ... will reset after 51s' → rate-limit/51000
   'rate limit exceeded' → rate-limit/60000 (unchanged)
   'Internal server error' → server/30000 (unchanged)
   '429 too many requests' → rate-limit/60000 (unchanged)
   'Invalid API key' → permanent (unchanged — still manual)
   'exhausted the usage. Will reset in 30s.' → rate-limit/30000
Non-goals: model-fallback-on-rate-limit (separate change — the
  provider-error-pause module currently waits and retries the same
  model; switching to the configured fallback model after the first
  rate-limit hit is a richer policy change).
Invariants:
  - Permanent classification still wins when no rate-limit pattern is
    present (auth/billing/invalid-key untouched).
  - Default 60s delay preserved when reset-time can't be parsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:35:55 +02:00
Mikael Hugo
f757a18417 feat(sf): /sf escalate user command + resolveEscalation (PDD)
Closes the user-facing loop for ADR-011 P2. The full escalation
end-to-end now works: agent files → loop pauses → user resolves
via /sf escalate → loop continues.

PDD spec for this change:

Purpose: let the user resolve a paused task escalation. Without this,
  escalation_pending=1 has no exit ramp other than manual SQL.
Consumer: users at the prompt — '/sf escalate list', '/sf escalate
  show <slice>/<task>', '/sf escalate resolve <slice>/<task> <choice>
  [-- <rationale>]'.
Contract:
  1. /sf escalate list → enumerate pending escalations in the active
     milestone, showing slice/task, question, options, recommendation.
  2. /sf escalate show <slice>/<task> → print the artifact's question
     + options with tradeoffs + recommendation + resolution status
     (resolved or unresolved).
  3. /sf escalate resolve <slice>/<task> <option-id> [-- <rationale>]
     → resolveEscalation in escalation.ts:
       - 'accept' selects the recommended option
       - any option id from the artifact is also valid
       - invalid choice → returns 'invalid-choice' with valid list
       - already resolved → 'already-resolved' with prior timestamp
       - not found → 'not-found' with the task path
     On success: artifact gains respondedAt/userChoice/userRationale,
     DB flags cleared, UOK audit event 'escalation-user-responded'
     emitted.
Failure boundary:
  - DB unavailable → 'SF database is not available. Run /sf doctor.'
  - Active milestone missing → 'No active milestone — nothing to list.'
  - Malformed artifact path → readEscalationArtifact returns null →
    handler returns 'not-found'.
  - clearTaskEscalationFlags called inside the resolver — never
    leaves the row in a half-resolved state.
Evidence: smoke test exercises 4 contract conditions end-to-end:
  invalid-choice, accept→resolved (chosen option = recommendation),
  already-resolved on re-run, not-found for unknown task. Typecheck
  clean.
Non-goals:
  - reject-blocker choice (gsd-2 has it; needs a blocker_source DB
    column SF doesn't have)
  - Carry-forward injection (claimEscalationOverride —
    findUnappliedEscalationOverride flow). The override is logged in
    the artifact for the user; agent context injection lands when
    the executor's prompt builder is wired to read it.
  - Cross-milestone listing (current implementation: active milestone
    only — matches /sf escalate list's most useful default behavior).
Invariants:
  - Safety: invalid-choice and not-found return without writing —
    no half-state.
  - Safety: clearTaskEscalationFlags zeros pending+awaiting in one
    UPDATE — reader can never see half-cleared state.
  - Liveness: after resolve, next state derivation cycle sees
    escalation_pending=0 → phase != 'escalating-task' → dispatch
    routes normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:31:45 +02:00
Mikael Hugo
2bf6c51fde feat(sf): expose escalation via sf_task_complete (PDD)
Closes the agent surface for ADR-011 P2. Task agents can now include
an optional 'escalation' payload on sf_task_complete, gated by
phases.mid_execution_escalation. When the preference is on and the
field is present, the executor builds and writes the artifact, which
flips tasks.escalation_pending or escalation_awaiting_review based
on continueWithDefault. The producer chain from 14efcd773 is now
agent-callable.

PDD spec for this change:

Purpose: give task agents a way to file a mid-execution escalation
  through the same tool they already call to record completion. No
  new tool surface — escalation rides as an optional field on
  sf_task_complete (matches gsd-2's design intent).
Consumer: task agents (execute-task) when they hit ambiguity that
  requires user judgment.
Contract:
  1. phases.mid_execution_escalation !== true → escalation field
     silently ignored, current behavior preserved. Verified.
  2. preference on + escalation field → buildEscalationArtifact
     validates, writeEscalationArtifact persists, DB flag set,
     result text + details report path + status. Verified.
  3. continueWithDefault=false → status='pending' (loop pauses).
     continueWithDefault=true → status='awaiting-review' (no pause).
  4. Escalation write failures are caught — task completion never
     blocks on an escalation error (logged via logError).
Failure boundary:
  - Validation errors from buildEscalationArtifact propagate as
    caught try/catch in the executor → logged → task still completes.
  - Preference loader fails → behaves as if preference is off.
  - DB write failures fall through; the task is already recorded.
Evidence: smoke test exercises both preference states (on writes
  artifact + sets flag; off silently ignores). Typecheck clean.
  Existing sf_task_complete callers without an escalation field
  see zero change in result shape or behavior.
Non-goals:
  - resolveEscalation (apply user's choice → carry forward as
    override) — bigger flow, later fire.
  - listActionableEscalations / listAllEscalations — for /sf
    escalate list, later fire.
  - /sf escalate user command (later fire).
Invariants:
  - Safety: escalation field is Optional in the schema; no caller
    is forced to migrate.
  - Liveness: build+write happen synchronously after handleCompleteTask
    returns; on success, the next state-derivation cycle picks up
    pending=1 and pauses.
Schema additions to preferences-validation.ts:
  - mid_execution_escalation, progressive_planning recognized as
    valid phases keys (previously typed in PhaseSkipPreferences but
    silently stripped by the validator).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:24:04 +02:00