Commit graph

4608 commits

Author SHA1 Message Date
Mikael Hugo
365c6bbc3b chore: formatter / linter touch-up (230 files)
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Pure formatting / lint-fix pass that ran during `npm run build:core`
in the session that landed the agent-runner / quota / coverage /
phase-2 routing work. No logic changes — indentation, trailing
commas, import sort, etc. Captured separately so the actual feature
commits stay scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:19:53 +02:00
Mikael Hugo
d80060fec5 fix(headless-triage): 60s no-output watchdog to cap session.prompt hang
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
When session.prompt() hits the deadlock seam sf-mp8e02m1-zpk903 (Promise
never resolves pre-LLM-dispatch, 0 syscall activity, blocks until outer
abort), the previous triage call had noOutputTimeoutMs=0 — meaning no
fast-fail path. The full 8-minute timeoutMs would burn before the
parent abort fired, wasting 8 minutes of subscription window per stuck
triage attempt.

This adds a 60s no-output watchdog: if no meaningful subagent event
fires for 60s, abort the prompt. Combined with the diagnostic logs in
subagent-runner.ts (commit 67e5ac9db) the operator gets:

  [subagent:triage-decider] phase=session.prompt-entered ...
  [subagent:triage-decider] STUCK phase=session.prompt 10001ms ...
  [forge] triage] apply blocked: triage-decider produced no output for 60000ms
                                                                       ↑ 60s, not 480s

Triage failure stays non-fatal (per the existing handleTriage error
catch in headless.ts:auto-triage path) — the autonomous loop continues
to its main milestone dispatch. Net effect: SF moves forward 8× faster
when the triage deadlock fires.

Doesn't fix the underlying Promise deadlock (still tracked in
sf-mp8e02m1-zpk903 and the new sf-mpmpXXX-... follow-up). This is a
"unblock the autonomous loop now, fix the deadlock later" patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:58:07 +02:00
Mikael Hugo
67e5ac9db1 diag(subagent-runner): per-phase timing + stuck-watchdog for sf-mp8e02m1-zpk903
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds visible diagnostics to runSubagent so the next time the
"session initialized but no LLM call" bug fires, the log identifies
which setup phase hangs.

Phases instrumented:
  - resourceLoader.reload()
  - createAgentSession()
  - bindExtensions(runLifecycle=...)
  - session.prompt() entry → return

Output format (stderr, prefixed with [subagent:<name>]):
  phase=resourceLoader.reload 23ms
  phase=createAgentSession 142ms
  phase=bindExtensions 89ms runLifecycle=true
  phase=session.prompt-entered taskLen=8421 timeoutMs=480000 noOutputMs=180000
  phase=session.prompt-returned 16234ms          ← normal completion
  STUCK phase=<X> 10000ms (no completion signal ...)   ← when watchdog fires

Each phase has a soft 10s watchdog that emits a STUCK line if the
await doesn't complete in time. The watchdog never aborts — just
surfaces visibility. Existing timeoutMs / noOutputTimeoutMs handle
actual termination.

This is investigation infrastructure for the third prompt-never-sent
seam (coding-agent/subagent-runner). The agent-runner.js seam
(sf-mp8g4rcd-w01tkh) was fixed in commit 8ee4d8358 with bounded
retries. This commit doesn't fix the underlying bug — it makes the
bug self-reporting next time it fires so operator and autonomous
loop both get actionable signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:40:17 +02:00
Mikael Hugo
8ee4d83581 fix(agent-runner): retry inbox refresh + throw loud on missing message
Closes sf-mp8g4rcd-w01tkh (FINAL prompt-never-sent root cause) — the
agent-runner.js:182 silent early-return that has been causing 59+
runaway-loop:idle-halt feedback entries and the recurring "Autonomous
loop stuck — no heartbeat" cascade.

Root cause: when swarm-dispatch's bus delivers a message and SF
kernel marks the unit as dispatched, the consumer agent's inbox
sometimes doesn't see the message immediately (different MessageBus
instance, SQLite read-cache lag). Previous code returned
{turnsProcessed:0, response:null} silently — caller (swarm-dispatch
dispatchAndWait) swallowed it as "no work" — LLM never ran — unit
appeared cancelled with no diagnostic.

Fix: bounded retry on missing-message with exponential backoff:
50, 100, 200, 400, 800 ms (1.55s total max). If target message
appears during retry → log recovery event, proceed normally. If still
missing after the last retry → throw a loud error with full inbox
state in the message. The caller wraps in try/catch and surfaces it
as turnResult.error, so the autonomous loop sees a real failure
instead of phantom forward progress.

What this resolves:
- Earlier today: `sf headless triage --apply` timed out at 480000ms
  because triage-decider subagent hit this bug. With retries, the
  triage-decider has 1.55s of latency tolerance to receive its prompt.
- The 59 backlogged runaway-loop:idle-halt entries are symptoms of
  the same root cause. Future occurrences will surface as loud errors,
  not phantom "stuck" units — operator/auto-supervisor can react.

Validated:
- 578 tests pass (49 files) including agent-runner / swarm-dispatch /
  inbox tests.
- runAgentTurn callers (auto/loop.js, agent-swarm.js, swarm-dispatch
  dispatchAndWait) all already handle thrown errors via try/catch
  with explicit error surfacing — the contract change is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:30:58 +02:00
Mikael Hugo
7e882b56d0 fix(engine): auto-remediate reassess-roadmap ASSESSMENT artifact (breaks loop)
The rogue-write detector in auto-post-unit.js:detectRogueFileWrites
checks for an `artifacts` table row with artifact_type='ASSESSMENT'
after a reassess-roadmap unit writes the assessment file. Other unit
types (execute-task, complete-slice) had auto-remediation paths that
sync the DB to the filesystem when state is stale. reassess-roadmap
did not.

Effect: the reassess_roadmap MCP tool writes the assessment file but
nothing registers it in the artifacts table. EVERY successful
iteration gets flagged rogue post-hoc; SF re-dispatches the same
unit; same thing happens; infinite loop until --timeout SIGTERM.

Empirically observed today (filed as sf-mpmp8min68-yoy2pa):
  Run 1: success $0.012, 16709 tokens → rogue → redispatch
  Run 2: success $0.017, 18925 tokens → rogue → redispatch
  Run 3: started → SIGTERM at --timeout 480000ms

Each iteration is real work product (real assessment content,
verdict: roadmap-confirmed) — the model is doing its job correctly,
the engine just doesn't recognize completion.

Fix: when assessment file exists on disk and artifacts row is
missing, INSERT into artifacts table via insertArtifact (parallel to
updateTaskStatus / updateSliceStatus auto-remediate in the same
function). Falls back to flagging rogue only if the insert fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:35:44 +02:00
Mikael Hugo
7217a24c28 fix(quota/openrouter): suppress usedFraction since SF policy is :free-only
OpenRouter's credit-balance (total_usage / total_credits) was being
used as a quota signal in phase 2's quotaHeadroomMultiplier, demoting
openrouter once credits got high (e.g., 80% used → 0.5 multiplier).

But SF's built-in policy (preferences-models.js:123-131
isModelAllowedByBuiltInProviderPolicy) hard-restricts every OpenRouter
route to `:free` + zero-cost models for ALL SF users — there's no
opt-in, no way to bypass it. Therefore SF dispatches NEVER consume
OpenRouter credits, and the credit balance is purely historical noise.

Fix: stop emitting `usedFraction` for OpenRouter's credit window. The
window is still reported (so `sf headless usage` shows credits state
for awareness) but quotaHeadroomMultiplier now treats OpenRouter as
"no quota signal" → neutral 1.0 — no spurious demotion.

Affects only the routing layer (selector). Display layer unchanged
beyond the label tweak ("info only — SF routes :free").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:26:16 +02:00
Mikael Hugo
f7cd01df0a feat(cli): sf --maintain drains self-feedback triage queue too
Extends the --maintain command (catalog refresh + quota refresh +
coverage audit) to also drain the self-feedback triage queue with
max=10 candidates per invocation. Combined with the daemon's 6h
maintenance timer that spawns `sf --maintain` in every configured
repo, this gives unattended cross-repo triage:

  Repo                       What gets triaged
  ────────────────────────── ─────────────────────────────────
  ~/code/singularity-forge   SF's own backlog (prompt-never-sent,
                             architecture defects, the 3
                             enhancement entries from today)
  ~/code/dr-repo             dr-repo's backlog (M005 flow
                             failures, agent friction, etc.)
  ~/code/centralcloud/*      whatever each subproject accrues

Both --maintain and `headless autonomous` use process.cwd() so they
target the right repo automatically. Interactive mode (plain `sf`)
deliberately does NOT auto-triage — that would spawn subagents while
the user is working in the same session, risking lock contention.

Triage failures stay non-fatal: catalog/quota/coverage work still
completes even if triage subagent dispatch hits the prompt-never-sent
bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:04:03 +02:00
Mikael Hugo
5a57549591 feat(headless): autonomous mode auto-drains self-feedback triage queue first
Before this change, `sf headless autonomous` only dispatched units for
the active milestone — never touched .sf/self-feedback.jsonl. The
existing `sf headless triage --apply` was a manual operator path
required for self-feedback to become actionable work. Defeats the
"SF self-heals" thesis: 146 entries can sit in the queue indefinitely
while the autonomous loop happily cranks on M005.

Now: at autonomous startup (not on resume, not on initial bootstrap)
SF calls handleTriage({ apply: true, max: 5 }) to drain the top-5
candidates from the triage queue before entering the dispatch loop.
The bound at max=5 keeps the upfront cost bounded; remaining items
process on the next session_start.

The comment on the existing triage handler in headless.ts:917-921
explicitly acknowledged the gap — autonomous-loop followUp delivery
was broken (sf-mp4rxkwb-l4baga). Wiring the deterministic triage
path BEFORE the dispatch loop closes that gap.

Opt-out: pass --skip-triage on the autonomous command (e.g. when
debugging a specific milestone without backlog churn).

Triage failures are non-fatal — they log a warning and the
autonomous loop continues with its existing milestone dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:01:50 +02:00
Mikael Hugo
8d0f41436b feat(benchmark-selector): phase 2 — quota-aware routing weight
Bias dispatch toward under-used subscriptions ("spend the subs") and
de-prioritize near-exhausted ones (avoid 429 walls). Multiplier is
applied to the benchmark score before sort, so it only re-orders
within the existing score → cost → coverage → preference ladder.
Unknown quota state stays neutral 1.0 — never punish a provider for
having no public quota API.

Curve, keyed on max(usedFraction) across all windows:
  < 0.20 → 1.15  (boost — lots of headroom, prefer to use it)
  < 0.50 → 1.00  (neutral)
  < 0.70 → 0.92  (slight steer away)
  < 0.90 → 0.50  (strong de-prioritize)
  < 0.95 → 0.20  (near-exhaustion)
  ≥ 0.95 → 0.05  (effectively skip)

Max-across-windows means kimi-coding's 5h-rolling window (tighter)
binds the decision even when the weekly is fresh.

New exported helper quotaHeadroomMultiplier(providerKey, getQuotaState?)
takes the resolver as optional dep for testability; defaults to
getProviderQuotaState from provider-quota-cache.js.

16 new tests cover the curve and the selectByBenchmarks integration
(unknown quota → unchanged, demoted high-usage provider, boosted
under-used provider, near-exhausted skipped when alternatives exist).

Filed as SF backlog item sf-mpmp8ie6xf-z4cxhg before — now closes
that loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:09:33 +02:00
Mikael Hugo
b39cf3387e fix(quota/zai): use raw key auth (no Bearer) + correct response shape
Cross-referenced vbgate/opencode-mystatus reference implementation
and found two real bugs in the zai fetcher:

1. Auth header: zai's monitor endpoint expects `Authorization: <key>`
   with NO `Bearer ` prefix. Using Bearer caused the server to treat
   the call as unauthenticated and return the generic "no coding
   plan" response even for active coding-plan users.

2. Response shape: real envelope is
     { code, msg, success, data: { limits: [
       { type: "TOKENS_LIMIT"|"TIME_LIMIT", usage, currentValue,
         percentage, nextResetTime? } ] } }
   Was looking for `data: [...]` directly and using `limit`/`used`
   fields. Now parses `data.data.limits[].usage` / `.currentValue`.

3. Added User-Agent header to match the reference tool.

Live probe finding: this user's z.ai key works fine for inference
(/api/coding/paas/v4/models returns 200 with the full model list)
but the monitor endpoint reports "no coding plan" — meaning their
account uses the regular pay-as-you-go z.ai/zhipu tier, not the
separately-billed "Coding Plan" subscription that the monitor
endpoint serves. The 429s they observe during inference are
rate-limit RPM/TPM errors, not coding-plan window exhaustion.
Code change is correct; the error message is now accurate and
actionable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:04:12 +02:00
Mikael Hugo
8fa9a4b8fa fix(quota): match real API shapes for kimi-coding / minimax / zai
Dogfooded `sf headless usage` against live APIs and discovered three
shape mismatches in the phase-1 fetchers:

- kimi-coding returns numeric fields as STRINGS ("limit": "100") and
  uses camelCase `resetTime`. Added toNum() coercion + reset hint
  extraction. Now reports Weekly + 5h rolling windows correctly.

- minimax response is `{ model_remains: [{ model_name,
  current_interval_total_count, current_interval_usage_count,
  current_weekly_total_count, current_weekly_usage_count, end_time,
  weekly_end_time, ...}] }` — per-model rolling + weekly windows, not
  the flat `remaining_tokens`/`total_tokens` shape I had assumed.
  Rewrote parser to emit one window per model entry.

- zai uses a `{ code, msg, success, data }` envelope. When
  `success: false` (e.g. user lacks an active coding plan), parser
  now surfaces vendor msg as the entry error instead of silently
  emitting no windows.

Tests updated to mirror real shapes; added one for zai's failure
envelope. 12 tests pass (was 11).

Live result from re-running `sf headless usage`:
  - openrouter: 80.7% used, $7.71 remaining (real signal — watch this)
  - kimi-coding: Weekly 32%, 5h 4%
  - minimax: MiniMax-M* 5h 1.4% + coding-plan-vlm/search 1.4%
  - gemini-cli: 0.0-0.4% across all models (clean)
  - zai: surfaces "user does not have a coding plan" — may need a
    different endpoint or scope depending on the user's account setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:59:53 +02:00
Mikael Hugo
c0d089f9ca feat(catalog/quota): global model catalog, benchmark coverage audit, provider quota visibility
Phase-1 work shipped together since prior auto-snapshots split it across
several commits. This commit captures the leftover type declarations,
the new provider-quota-cache test suite, and the last register-hooks /
cli wiring.

Highlights now in tree:

- Model catalog moved from per-project to global `~/.sf/model-catalog/`
  via `sfHome()` (one cache shared by all repos; no more 9-dir
  duplication).

- `benchmark-coverage.js` audits the dispatchable model set against
  `learning/data/model-benchmarks.json` at session_start, writes
  `~/.sf/benchmark-coverage.json`, notifies on change.

- `provider-quota-cache.js` introduces phase-1 subscription quota
  visibility for the 5 providers with documented APIs:
  kimi-coding (/coding/v1/usages), openrouter (/api/v1/credits),
  minimax (/v1/token_plan/remains), zai (/api/monitor/usage/quota/limit),
  google-gemini-cli (existing snapshotGeminiCliAccount). 15-min TTL,
  global cache.

- `sf --maintain` CLI flag refreshes catalogs + quotas + coverage audit
  in one idempotent pass. Daemon spawns it every 6h.

- `sf headless usage` rewritten to display all providers from the
  unified cache, with explicit "no public API" notes for mistral,
  ollama-cloud, opencode, opencode-go, xiaomi.

- Awaitable `runXIfStale` variants for model-catalog, gemini-catalog,
  openai-codex-catalog (the schedule* variants now wrap them in
  setImmediate).

- TypeScript declarations added for the new JS modules so the
  dist-redirect pipeline type-checks cleanly.

Phase 2 (quota-aware routing in benchmark-selector) is filed as SF
self-feedback for the backlog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:37:20 +02:00
Mikael Hugo
effbb75f83 sf snapshot: uncommitted changes after 30m inactivity 2026-05-16 17:30:35 +02:00
Mikael Hugo
b5764af27b sf snapshot: uncommitted changes after 33m inactivity 2026-05-16 17:00:13 +02:00
Mikael Hugo
b73e386090 sf snapshot: uncommitted changes after 56m inactivity 2026-05-16 16:26:40 +02:00
Mikael Hugo
4442400d11 sf snapshot: uncommitted changes after 30m inactivity 2026-05-16 15:29:43 +02:00
Mikael Hugo
da0c41d375 sf snapshot: uncommitted changes after 56m inactivity 2026-05-16 14:59:40 +02:00
Mikael Hugo
6071a9207c sf snapshot: uncommitted changes after 53m inactivity 2026-05-16 14:03:34 +02:00
Mikael Hugo
af2f86c3c2 sf snapshot: uncommitted changes after 827m inactivity 2026-05-16 13:10:15 +02:00
Mikael Hugo
6f32f9287a sf snapshot: pre-dispatch, uncommitted changes after 30m inactivity 2026-05-15 23:22:58 +02:00
Mikael Hugo
e2c3d6542c sf snapshot: uncommitted changes after 97m inactivity 2026-05-15 22:52:58 +02:00
Mikael Hugo
a2a6ab767c fix(auto): replan inactive milestones with DB context 2026-05-15 21:15:04 +02:00
Mikael Hugo
ecf6af92e8 fix(auto): avoid resuming blocked no-unit sessions 2026-05-15 20:56:15 +02:00
Mikael Hugo
03f6d4990f fix(auto): set solver iteration default to 25 2026-05-15 20:35:10 +02:00
Mikael Hugo
8e85a6e673 fix(state): skip cancelled slices during dispatch 2026-05-15 20:24:11 +02:00
Mikael Hugo
62d63f111e fix(auto): bound solver iteration defaults 2026-05-15 20:14:30 +02:00
Mikael Hugo
0b187b9f62 fix(headless): remove legacy v1 fallback path 2026-05-15 20:12:00 +02:00
Mikael Hugo
e2e096c5c7 feat(rpc): configurable RPC init timeout via SF_RPC_INIT_TIMEOUT_MS
Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init().
Default init timeout increased from 30s to 120s. Override via env var.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 20:00:26 +02:00
Mikael Hugo
ced90e84a8 test(headless): update v2 migration tests for fatal-by-default fallback policy
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 20:00:02 +02:00
Mikael Hugo
5b831e587b feat(headless): gate v1 string-matching fallback behind env var
Require SF_HEADLESS_ALLOW_V1_FALLBACK=1 to use legacy v1 fallback.
Default behavior now exits with error when v2 init fails, preventing
silent degradation to less reliable protocol matching.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:59:03 +02:00
Mikael Hugo
15ae3d02b7 fix(sf-db): write snapshots atomically 2026-05-15 19:49:04 +02:00
Mikael Hugo
a8a28bd7c0 docs(specs): add sf-prompt-modularization.md operator guide
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:47:20 +02:00
Mikael Hugo
92ff8186ba feat(prompts): add v2 migration regression tests + fix template variable drift
- Migrate all remaining v1 builders (research-milestone, complete-slice,
  run-uat, reassess-roadmap, deploy, smoke-production, release, rollback,
  challenge) from composeInlinedContext to composeUnitContext v2.
- Remove unused composeInlinedContext import from auto-prompts.js.
- Add 7 regression tests in auto-prompts-v2-migration.test.mjs covering
  all migrated builders.
- Fix template variable drift: deploy.md expected {{releaseVersion}} and
  release.md expected {{newVersion}} — neither builder provided them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:46:13 +02:00
Mikael Hugo
bd27f61da7 feat(prompts): migrate remaining builders to composeUnitContext 2026-05-15 19:34:53 +02:00
Mikael Hugo
3e1e177466 fix(uok): auto-repair runtime projection mismatch 2026-05-15 19:34:42 +02:00
Mikael Hugo
c02c71f216 chore(gitignore): ignore sf session todo state 2026-05-15 19:25:14 +02:00
Mikael Hugo
ce169ddc22 test(manifest): cover computed context declarations 2026-05-15 19:10:17 +02:00
Mikael Hugo
30f1cca984 feat(manifest): add knowledge/graph computed artifacts to remaining unit types
M004 S01: Update manifests to support knowledge and graph artifacts.

Adds computed: ["knowledge", "graph"] to manifests that did not yet
declare them, matching the actual behavior of their prompt builders:

- execute-task, reactive-execute
- discuss-project, discuss-requirements, research-project
- workflow-preferences (knowledge only — no graph scope)

These unit types already inline knowledge/graph via their builder
functions in auto-prompts.js; the manifest declarations were missing.
This brings the manifest schema into sync with real dispatch behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:09:47 +02:00
Mikael Hugo
c718bd605b chore(gitignore): ignore sf active runtime projections 2026-05-15 19:07:58 +02:00
Mikael Hugo
c6b8815ad9 feat(verification): wire auto-defer into finalize phase
M003 S05 continuation: phases-finalize.js now handles "continue" from
runPostUnitVerification as an auto-defer path (low-risk findings).

- phases-finalize.js: added `verificationResult === "continue"` branch
  after pause/retry checks — logs "verification-deferred" and falls
  through to post-verification processing instead of breaking/retrying
- uok/auto-verification.js: defer decision runs before retry logic,
  returns "continue" without consuming retry attempts
- verification-evidence.js: forwards deferred fields in evidence JSON

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:59:07 +02:00
Mikael Hugo
f48a4cc7c5 feat(verification): auto-defer confidence policy for low-risk findings
Implements M003 S05: auto-deferral policy for low-risk validation findings.

- New verification-defer-policy.js: classifyCheck, computeDeferConfidence,
  decideAutoDefer — classifies failed checks as deferrable/blocking/unknown
- Patterns: style/format/deprecation-only → deferrable; error/fail/crash/fatal
  → blocking (always wins)
- Confidence scoring: 0.9 all-deferrable, 0.7 mixed, 0.5 unknown, 0.0 blocking
- Threshold preference: verification_auto_defer_threshold (default 0.75)
- Integration in uok/auto-verification.js: checks defer before retry/pause,
  does not consume retry attempts, writes deferred: true + reasons to evidence JSON
- verification-evidence.js: forwards deferred/deferredReasons/deferConfidence fields
- Preferences wired: validation, types, serializer
- Tests: 6 unit tests for classification, confidence, threshold, blocking dominance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:55:26 +02:00
Mikael Hugo
1b3dba6e51 test(uok): align purpose-coherence fixtures with merged P2/P3 schema
After cherry-picking P2 (v72: slices.traces_vision_fragment) and P3
(v70: tasks.purpose_trace) onto main, the schema migration ladder
now adds those columns automatically on every openDatabase. The P4
test fixtures, which were authored when those migrations were still
in their own worktree branches, manually ALTER'd the columns —
which throws "duplicate column name" post-merge.

Two changes, both purely about exercising the same gate paths under
the new ground truth:

- makeForwardDb no longer manually ALTERs — the migration ladder
  already provides the columns. The "trace value NULL" branch is
  exercised by inserting rows with explicit NULL instead of relying
  on the column being absent.
- The "legacy DB" test no longer expects the warning to mention the
  column name (the column always exists post-migration). The
  underlying SqliteError catch in evaluatePurposeCoherence remains
  for the genuinely-legacy DB case where someone is running against
  a fixture that predates the migration; the test now exercises the
  NULL-value warn path which is the real-world signal operators see.

All 17 uok-purpose-coherence tests pass; full 5-pillar sweep
(P1+P2+P3+P4+P5 + migration) 53/53 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:54:59 +02:00
Mikael Hugo
008f0c685d fix(schema): reorder ADR-0000 migrations so v70/v71/v72 run in order
After cherry-picking the swarm commits the migration file had v72
declared before v70/v71 — when applied to a v69 DB the loop ran v72
first, set appliedVersion=72, and the v70/v71 guards `if
(appliedVersion < 70)` then `< 71` short-circuited so neither
ALTER ran on legacy DBs. Reordered so the file flows v70 → v71 → v72
matching version numbers; idempotent column probes on fresh DBs
still pass.

Verified: full sf-db-migration suite 13/13 green, including the
v52-and-v27 legacy-fixture paths that exercise the migration ladder
end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:53:04 +02:00
Mikael Hugo
725affd126 feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71)
SF is a purpose-to-software compiler — every self_feedback row must name
the milestone vision or slice goal it's filed against, so triage can
prioritize against purpose rather than treating each row as floating.

  - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT.
    NULL allowed for legacy rows; fresh-DB CREATE includes the column.
  - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor
    (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose})
    pushes a LIKE %fragment% filter into the DB layer so triage doesn't
    have to pull the full table.
  - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON
    projection for legacy rows where the column is NULL.
  - headless-feedback CLI: `feedback add --purpose <fragment>` persists
    the anchor; `feedback list --purpose <fragment>` filters by it.
    Omission stays valid — restoration is additive, not breaking.
  - help-text + migration test updated; new vitest covers add/list
    round-trip, NULL-on-omit legacy compat, substring match, and the
    help-text documentation contract.

Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md:
"non-trivial artifacts must name their purpose and consumer."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:51:52 +02:00
Mikael Hugo
3c416162b2 feat(sf-db): record per-task purpose_trace at complete_task (ADR-0000)
Restore the purpose-to-software doctrine at the slice gate: every task
the executor closes must name the slice-goal sentence or clause it
served. complete-slice now refuses to flip a slice to complete while
any of its tasks has a NULL purpose_trace, making "did all tasks
actually serve the slice goal" a mechanical check instead of a vibe.

Schema migration v70 adds a nullable purpose_trace TEXT to tasks
(legacy rows stay valid). complete_task refuses without it and quotes
slice.goal in the error so the agent can anchor. insertTask /
updateTaskStatus accept the new field, rowToTask exposes it, and a
new updateTaskPurposeTrace helper covers later corrections.

Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:50:27 +02:00
Mikael Hugo
fa657f2523 feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2)
Restoration of doctrine: plan-milestone now emits a literal milestone.vision
clause per slice (traces_vision_fragment) so validate-milestone has structured
grounds for assessment instead of re-reading the vision through the LLM every
time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and
plan_milestone tool start requiring it for new slices, rejecting fragments that
do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:48:50 +02:00
Mikael Hugo
3b83f0898b merge(P4): purpose-coherence-gate before every dispatch (ADR-0000) 2026-05-15 18:45:17 +02:00
Mikael Hugo
5e2c7a7166 merge(P1): vision quality gate on sf new-milestone (ADR-0000) 2026-05-15 18:45:08 +02:00
Mikael Hugo
59fbaf4b0f test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers
Cleans up doctor-purpose-gate.test.mjs:
- Uses insertMilestone/insertSlice helpers instead of raw SQL
- Removes redundant test from doctor-plan-dir-normalization.test.mjs
- Adds module-level JSDoc purpose comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:44:09 +02:00
Mikael Hugo
a303b5db29 feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000)
Restores the eight-PDD purpose gate at the autonomous-loop boundary
required by ADR-0000 (SF is a purpose-to-software compiler). The gate
walks milestone vision -> slice.traces_vision_fragment ->
task.purpose_trace before every dispatch and refuses to proceed when
the purpose chain is broken at the vision root (degraded-vision).

- New uok/purpose-coherence.js with a pure verdict function and a
  DB-backed adapter. Reads vision/trace columns directly via SQL so
  pre-P2/P3 schema migrations are tolerated.
- Wired into auto/phases-pre-dispatch.js alongside resource-version-
  guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on
  every pre-dispatch turn and emits to the existing trace JSONL.
- Outcome ladder: fail (vision missing -> pause loop), warn (trace
  columns missing or NULL -> surface but allow dispatch so legacy DBs
  don't hard-break on day one), pass (full chain).
- Tests in tests/uok-purpose-coherence.test.mjs cover the four
  contracted states plus the column-missing downgrade path on a
  pre-migration schema.

Refs: ADR-0000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:42:55 +02:00