Before this change, `sf headless autonomous` only dispatched units for
the active milestone — never touched .sf/self-feedback.jsonl. The
existing `sf headless triage --apply` was a manual operator path
required for self-feedback to become actionable work. Defeats the
"SF self-heals" thesis: 146 entries can sit in the queue indefinitely
while the autonomous loop happily cranks on M005.
Now: at autonomous startup (not on resume, not on initial bootstrap)
SF calls handleTriage({ apply: true, max: 5 }) to drain the top-5
candidates from the triage queue before entering the dispatch loop.
The bound at max=5 keeps the upfront cost bounded; remaining items
process on the next session_start.
The comment on the existing triage handler in headless.ts:917-921
explicitly acknowledged the gap — autonomous-loop followUp delivery
was broken (sf-mp4rxkwb-l4baga). Wiring the deterministic triage
path BEFORE the dispatch loop closes that gap.
Opt-out: pass --skip-triage on the autonomous command (e.g. when
debugging a specific milestone without backlog churn).
Triage failures are non-fatal — they log a warning and the
autonomous loop continues with its existing milestone dispatch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bias dispatch toward under-used subscriptions ("spend the subs") and
de-prioritize near-exhausted ones (avoid 429 walls). Multiplier is
applied to the benchmark score before sort, so it only re-orders
within the existing score → cost → coverage → preference ladder.
Unknown quota state stays neutral 1.0 — never punish a provider for
having no public quota API.
Curve, keyed on max(usedFraction) across all windows:
< 0.20 → 1.15 (boost — lots of headroom, prefer to use it)
< 0.50 → 1.00 (neutral)
< 0.70 → 0.92 (slight steer away)
< 0.90 → 0.50 (strong de-prioritize)
< 0.95 → 0.20 (near-exhaustion)
≥ 0.95 → 0.05 (effectively skip)
Max-across-windows means kimi-coding's 5h-rolling window (tighter)
binds the decision even when the weekly is fresh.
New exported helper quotaHeadroomMultiplier(providerKey, getQuotaState?)
takes the resolver as optional dep for testability; defaults to
getProviderQuotaState from provider-quota-cache.js.
16 new tests cover the curve and the selectByBenchmarks integration
(unknown quota → unchanged, demoted high-usage provider, boosted
under-used provider, near-exhausted skipped when alternatives exist).
Filed as SF backlog item sf-mpmp8ie6xf-z4cxhg before — now closes
that loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-referenced vbgate/opencode-mystatus reference implementation
and found two real bugs in the zai fetcher:
1. Auth header: zai's monitor endpoint expects `Authorization: <key>`
with NO `Bearer ` prefix. Using Bearer caused the server to treat
the call as unauthenticated and return the generic "no coding
plan" response even for active coding-plan users.
2. Response shape: real envelope is
{ code, msg, success, data: { limits: [
{ type: "TOKENS_LIMIT"|"TIME_LIMIT", usage, currentValue,
percentage, nextResetTime? } ] } }
Was looking for `data: [...]` directly and using `limit`/`used`
fields. Now parses `data.data.limits[].usage` / `.currentValue`.
3. Added User-Agent header to match the reference tool.
Live probe finding: this user's z.ai key works fine for inference
(/api/coding/paas/v4/models returns 200 with the full model list)
but the monitor endpoint reports "no coding plan" — meaning their
account uses the regular pay-as-you-go z.ai/zhipu tier, not the
separately-billed "Coding Plan" subscription that the monitor
endpoint serves. The 429s they observe during inference are
rate-limit RPM/TPM errors, not coding-plan window exhaustion.
Code change is correct; the error message is now accurate and
actionable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dogfooded `sf headless usage` against live APIs and discovered three
shape mismatches in the phase-1 fetchers:
- kimi-coding returns numeric fields as STRINGS ("limit": "100") and
uses camelCase `resetTime`. Added toNum() coercion + reset hint
extraction. Now reports Weekly + 5h rolling windows correctly.
- minimax response is `{ model_remains: [{ model_name,
current_interval_total_count, current_interval_usage_count,
current_weekly_total_count, current_weekly_usage_count, end_time,
weekly_end_time, ...}] }` — per-model rolling + weekly windows, not
the flat `remaining_tokens`/`total_tokens` shape I had assumed.
Rewrote parser to emit one window per model entry.
- zai uses a `{ code, msg, success, data }` envelope. When
`success: false` (e.g. user lacks an active coding plan), parser
now surfaces vendor msg as the entry error instead of silently
emitting no windows.
Tests updated to mirror real shapes; added one for zai's failure
envelope. 12 tests pass (was 11).
Live result from re-running `sf headless usage`:
- openrouter: 80.7% used, $7.71 remaining (real signal — watch this)
- kimi-coding: Weekly 32%, 5h 4%
- minimax: MiniMax-M* 5h 1.4% + coding-plan-vlm/search 1.4%
- gemini-cli: 0.0-0.4% across all models (clean)
- zai: surfaces "user does not have a coding plan" — may need a
different endpoint or scope depending on the user's account setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 work shipped together since prior auto-snapshots split it across
several commits. This commit captures the leftover type declarations,
the new provider-quota-cache test suite, and the last register-hooks /
cli wiring.
Highlights now in tree:
- Model catalog moved from per-project to global `~/.sf/model-catalog/`
via `sfHome()` (one cache shared by all repos; no more 9-dir
duplication).
- `benchmark-coverage.js` audits the dispatchable model set against
`learning/data/model-benchmarks.json` at session_start, writes
`~/.sf/benchmark-coverage.json`, notifies on change.
- `provider-quota-cache.js` introduces phase-1 subscription quota
visibility for the 5 providers with documented APIs:
kimi-coding (/coding/v1/usages), openrouter (/api/v1/credits),
minimax (/v1/token_plan/remains), zai (/api/monitor/usage/quota/limit),
google-gemini-cli (existing snapshotGeminiCliAccount). 15-min TTL,
global cache.
- `sf --maintain` CLI flag refreshes catalogs + quotas + coverage audit
in one idempotent pass. Daemon spawns it every 6h.
- `sf headless usage` rewritten to display all providers from the
unified cache, with explicit "no public API" notes for mistral,
ollama-cloud, opencode, opencode-go, xiaomi.
- Awaitable `runXIfStale` variants for model-catalog, gemini-catalog,
openai-codex-catalog (the schedule* variants now wrap them in
setImmediate).
- TypeScript declarations added for the new JS modules so the
dist-redirect pipeline type-checks cleanly.
Phase 2 (quota-aware routing in benchmark-selector) is filed as SF
self-feedback for the backlog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init().
Default init timeout increased from 30s to 120s. Override via env var.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require SF_HEADLESS_ALLOW_V1_FALLBACK=1 to use legacy v1 fallback.
Default behavior now exits with error when v2 init fails, preventing
silent degradation to less reliable protocol matching.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
M004 S01: Update manifests to support knowledge and graph artifacts.
Adds computed: ["knowledge", "graph"] to manifests that did not yet
declare them, matching the actual behavior of their prompt builders:
- execute-task, reactive-execute
- discuss-project, discuss-requirements, research-project
- workflow-preferences (knowledge only — no graph scope)
These unit types already inline knowledge/graph via their builder
functions in auto-prompts.js; the manifest declarations were missing.
This brings the manifest schema into sync with real dispatch behavior.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After cherry-picking P2 (v72: slices.traces_vision_fragment) and P3
(v70: tasks.purpose_trace) onto main, the schema migration ladder
now adds those columns automatically on every openDatabase. The P4
test fixtures, which were authored when those migrations were still
in their own worktree branches, manually ALTER'd the columns —
which throws "duplicate column name" post-merge.
Two changes, both purely about exercising the same gate paths under
the new ground truth:
- makeForwardDb no longer manually ALTERs — the migration ladder
already provides the columns. The "trace value NULL" branch is
exercised by inserting rows with explicit NULL instead of relying
on the column being absent.
- The "legacy DB" test no longer expects the warning to mention the
column name (the column always exists post-migration). The
underlying SqliteError catch in evaluatePurposeCoherence remains
for the genuinely-legacy DB case where someone is running against
a fixture that predates the migration; the test now exercises the
NULL-value warn path which is the real-world signal operators see.
All 17 uok-purpose-coherence tests pass; full 5-pillar sweep
(P1+P2+P3+P4+P5 + migration) 53/53 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After cherry-picking the swarm commits the migration file had v72
declared before v70/v71 — when applied to a v69 DB the loop ran v72
first, set appliedVersion=72, and the v70/v71 guards `if
(appliedVersion < 70)` then `< 71` short-circuited so neither
ALTER ran on legacy DBs. Reordered so the file flows v70 → v71 → v72
matching version numbers; idempotent column probes on fresh DBs
still pass.
Verified: full sf-db-migration suite 13/13 green, including the
v52-and-v27 legacy-fixture paths that exercise the migration ladder
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SF is a purpose-to-software compiler — every self_feedback row must name
the milestone vision or slice goal it's filed against, so triage can
prioritize against purpose rather than treating each row as floating.
- Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT.
NULL allowed for legacy rows; fresh-DB CREATE includes the column.
- sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor
(camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose})
pushes a LIKE %fragment% filter into the DB layer so triage doesn't
have to pull the full table.
- rowToSelfFeedback exposes purposeAnchor, falling back to the JSON
projection for legacy rows where the column is NULL.
- headless-feedback CLI: `feedback add --purpose <fragment>` persists
the anchor; `feedback list --purpose <fragment>` filters by it.
Omission stays valid — restoration is additive, not breaking.
- help-text + migration test updated; new vitest covers add/list
round-trip, NULL-on-omit legacy compat, substring match, and the
help-text documentation contract.
Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md:
"non-trivial artifacts must name their purpose and consumer."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the purpose-to-software doctrine at the slice gate: every task
the executor closes must name the slice-goal sentence or clause it
served. complete-slice now refuses to flip a slice to complete while
any of its tasks has a NULL purpose_trace, making "did all tasks
actually serve the slice goal" a mechanical check instead of a vibe.
Schema migration v70 adds a nullable purpose_trace TEXT to tasks
(legacy rows stay valid). complete_task refuses without it and quotes
slice.goal in the error so the agent can anchor. insertTask /
updateTaskStatus accept the new field, rowToTask exposes it, and a
new updateTaskPurposeTrace helper covers later corrections.
Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restoration of doctrine: plan-milestone now emits a literal milestone.vision
clause per slice (traces_vision_fragment) so validate-milestone has structured
grounds for assessment instead of re-reading the vision through the LLM every
time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and
plan_milestone tool start requiring it for new slices, rejecting fragments that
do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restores the eight-PDD purpose gate at the autonomous-loop boundary
required by ADR-0000 (SF is a purpose-to-software compiler). The gate
walks milestone vision -> slice.traces_vision_fragment ->
task.purpose_trace before every dispatch and refuses to proceed when
the purpose chain is broken at the vision root (degraded-vision).
- New uok/purpose-coherence.js with a pure verdict function and a
DB-backed adapter. Reads vision/trace columns directly via SQL so
pre-P2/P3 schema migrations are tolerated.
- Wired into auto/phases-pre-dispatch.js alongside resource-version-
guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on
every pre-dispatch turn and emits to the existing trace JSONL.
- Outcome ladder: fail (vision missing -> pause loop), warn (trace
columns missing or NULL -> surface but allow dispatch so legacy DBs
don't hard-break on day one), pass (full chain).
- Tests in tests/uok-purpose-coherence.test.mjs cover the four
contracted states plus the column-missing downgrade path on a
pre-migration schema.
Refs: ADR-0000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two new doctor checks to checkEngineHealth():
- db_milestone_missing_vision: error when a milestone has no vision
(the WHY/purpose field per ADR-0000)
- db_slice_missing_goal: error when a slice has no goal
(the WHAT/purpose field per ADR-0000)
Both checks are non-fixable (the operator must define purpose).
This aligns with ADR-0000 §Enforcement: "Non-trivial milestones,
slices, tasks, ADRs, specs, tests, and exported symbols must name
their purpose and consumer."
Tests: 2 cases — milestone without vision flagged, slice without
goal flagged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restoration of forgotten doctrine: ADR-0000 declares the eight PDD
fields (Purpose, Consumer, Contract, Failure boundary, Evidence,
Non-goals, Invariants, Assumptions) the purpose gate, but
`sf headless new-milestone --context <file>` was accepting any
context including empty or trivially-thin seed docs. This wires a
pre-create check that refuses the run when fields are missing or
too thin, naming exactly which ones so the operator can fix the
seed doc and retry.
- new src/resources/extensions/sf/headless-pdd-check.js: scans
context for the eight fields (heading and inline-label forms) and
reports missing/sparse, plus a minimum-spine check (Purpose +
Consumer + Contract + Evidence-or-Falsifier).
- src/headless.ts calls the check after loadContext, before
bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text.
- --skip-pdd-check is the migration escape hatch (warning printed,
PDD gate bypassed) for milestones that pre-date the gate.
- SF-internal auto-bootstrap (autonomous→new-milestone fallback)
is exempted because the seed is SF-generated, not operator-PDD.
- vitest test covers missing-Purpose, missing-Consumer, all-8,
sparse, inline-label form, Falsifier-as-Evidence spine, and the
doctrine field order.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05,
PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but
the milestone row was still active because its only slice was a
post-migration skipped placeholder. After the previous fix routed
all-skipped milestones to pre-planning, SF ran roadmap-meeting +
plan-milestone and wrote 3 new slices on a milestone whose
contract-level work was already done — burned ~4 LLM turns on
plausibly-adjacent but unwanted re-decomposition.
Root cause: deriveStateFromDb's milestone-completion gate consults
only slice statuses (and indirectly the milestone row's own status
field). It never reads REQUIREMENTS.md to check whether the
contract is already satisfied. The slice-based view collapsed the
real signal.
Fix:
- New parseRequirementsByMilestone(content) helper in files.js:
parses REQUIREMENTS.md, groups entries by their `Primary owning
milestone` field, returns Map<id, {complete, incomplete}>.
- handleAllSlicesDone now reads REQUIREMENTS.md before its
slice-based real-work check. If a milestone has at least one
owning requirement and zero of them are incomplete, route to
completing-milestone with nextAction naming the requirement count
(so the operator can see *why* the milestone is being closed
without manually opening REQUIREMENTS.md).
- Best-effort: REQUIREMENTS.md parse failure falls through to the
existing slice-based rule. Missing file likewise — no regression
for projects that don't keep a requirements file.
Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI.
End-to-end verified by re-running sf headless query on dr-repo
M003: now reports phase=completing-milestone with the right
requirement-count message.
Tests: 5 new cases — all complete + slice skipped → completing,
some active → pre-planning, zero owning requirements falls through,
missing file falls through, all complete + real slice work still
completes. Existing 4 all-skipped-replan cases still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T01: Added integration test auto-halt-self-feedback.test.mjs that proves:
- HaltWatchdog.check() creates a self-feedback DB entry with
kind=runaway-loop:idle-halt, severity=high, blocking=true
- Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated
- Deduplication works (one entry per idle period)
- New heartbeat resets and creates a new entry for the next idle period
T02: Enhanced evidence string to include elapsedMs, iteration, and
thresholdMs explicitly (R003 actionable context requirement).
Tests: 36/36 pass across auto-halt-self-feedback,
auto-halt-watchdog-notify, and self-feedback-db suites.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>