Commit graph

4577 commits

Author SHA1 Message Date
Mikael Hugo
a8a28bd7c0 docs(specs): add sf-prompt-modularization.md operator guide
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:47:20 +02:00
Mikael Hugo
92ff8186ba feat(prompts): add v2 migration regression tests + fix template variable drift
- Migrate all remaining v1 builders (research-milestone, complete-slice,
  run-uat, reassess-roadmap, deploy, smoke-production, release, rollback,
  challenge) from composeInlinedContext to composeUnitContext v2.
- Remove unused composeInlinedContext import from auto-prompts.js.
- Add 7 regression tests in auto-prompts-v2-migration.test.mjs covering
  all migrated builders.
- Fix template variable drift: deploy.md expected {{releaseVersion}} and
  release.md expected {{newVersion}} — neither builder provided them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:46:13 +02:00
Mikael Hugo
bd27f61da7 feat(prompts): migrate remaining builders to composeUnitContext 2026-05-15 19:34:53 +02:00
Mikael Hugo
3e1e177466 fix(uok): auto-repair runtime projection mismatch 2026-05-15 19:34:42 +02:00
Mikael Hugo
c02c71f216 chore(gitignore): ignore sf session todo state 2026-05-15 19:25:14 +02:00
Mikael Hugo
ce169ddc22 test(manifest): cover computed context declarations 2026-05-15 19:10:17 +02:00
Mikael Hugo
30f1cca984 feat(manifest): add knowledge/graph computed artifacts to remaining unit types
M004 S01: Update manifests to support knowledge and graph artifacts.

Adds computed: ["knowledge", "graph"] to manifests that did not yet
declare them, matching the actual behavior of their prompt builders:

- execute-task, reactive-execute
- discuss-project, discuss-requirements, research-project
- workflow-preferences (knowledge only — no graph scope)

These unit types already inline knowledge/graph via their builder
functions in auto-prompts.js; the manifest declarations were missing.
This brings the manifest schema into sync with real dispatch behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 19:09:47 +02:00
Mikael Hugo
c718bd605b chore(gitignore): ignore sf active runtime projections 2026-05-15 19:07:58 +02:00
Mikael Hugo
c6b8815ad9 feat(verification): wire auto-defer into finalize phase
M003 S05 continuation: phases-finalize.js now handles "continue" from
runPostUnitVerification as an auto-defer path (low-risk findings).

- phases-finalize.js: added `verificationResult === "continue"` branch
  after pause/retry checks — logs "verification-deferred" and falls
  through to post-verification processing instead of breaking/retrying
- uok/auto-verification.js: defer decision runs before retry logic,
  returns "continue" without consuming retry attempts
- verification-evidence.js: forwards deferred fields in evidence JSON

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:59:07 +02:00
Mikael Hugo
f48a4cc7c5 feat(verification): auto-defer confidence policy for low-risk findings
Implements M003 S05: auto-deferral policy for low-risk validation findings.

- New verification-defer-policy.js: classifyCheck, computeDeferConfidence,
  decideAutoDefer — classifies failed checks as deferrable/blocking/unknown
- Patterns: style/format/deprecation-only → deferrable; error/fail/crash/fatal
  → blocking (always wins)
- Confidence scoring: 0.9 all-deferrable, 0.7 mixed, 0.5 unknown, 0.0 blocking
- Threshold preference: verification_auto_defer_threshold (default 0.75)
- Integration in uok/auto-verification.js: checks defer before retry/pause,
  does not consume retry attempts, writes deferred: true + reasons to evidence JSON
- verification-evidence.js: forwards deferred/deferredReasons/deferConfidence fields
- Preferences wired: validation, types, serializer
- Tests: 6 unit tests for classification, confidence, threshold, blocking dominance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:55:26 +02:00
Mikael Hugo
1b3dba6e51 test(uok): align purpose-coherence fixtures with merged P2/P3 schema
After cherry-picking P2 (v72: slices.traces_vision_fragment) and P3
(v70: tasks.purpose_trace) onto main, the schema migration ladder
now adds those columns automatically on every openDatabase. The P4
test fixtures, which were authored when those migrations were still
in their own worktree branches, manually ALTER'd the columns —
which throws "duplicate column name" post-merge.

Two changes, both purely about exercising the same gate paths under
the new ground truth:

- makeForwardDb no longer manually ALTERs — the migration ladder
  already provides the columns. The "trace value NULL" branch is
  exercised by inserting rows with explicit NULL instead of relying
  on the column being absent.
- The "legacy DB" test no longer expects the warning to mention the
  column name (the column always exists post-migration). The
  underlying SqliteError catch in evaluatePurposeCoherence remains
  for the genuinely-legacy DB case where someone is running against
  a fixture that predates the migration; the test now exercises the
  NULL-value warn path which is the real-world signal operators see.

All 17 uok-purpose-coherence tests pass; full 5-pillar sweep
(P1+P2+P3+P4+P5 + migration) 53/53 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:54:59 +02:00
Mikael Hugo
008f0c685d fix(schema): reorder ADR-0000 migrations so v70/v71/v72 run in order
After cherry-picking the swarm commits the migration file had v72
declared before v70/v71 — when applied to a v69 DB the loop ran v72
first, set appliedVersion=72, and the v70/v71 guards `if
(appliedVersion < 70)` then `< 71` short-circuited so neither
ALTER ran on legacy DBs. Reordered so the file flows v70 → v71 → v72
matching version numbers; idempotent column probes on fresh DBs
still pass.

Verified: full sf-db-migration suite 13/13 green, including the
v52-and-v27 legacy-fixture paths that exercise the migration ladder
end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:53:04 +02:00
Mikael Hugo
725affd126 feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71)
SF is a purpose-to-software compiler — every self_feedback row must name
the milestone vision or slice goal it's filed against, so triage can
prioritize against purpose rather than treating each row as floating.

  - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT.
    NULL allowed for legacy rows; fresh-DB CREATE includes the column.
  - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor
    (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose})
    pushes a LIKE %fragment% filter into the DB layer so triage doesn't
    have to pull the full table.
  - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON
    projection for legacy rows where the column is NULL.
  - headless-feedback CLI: `feedback add --purpose <fragment>` persists
    the anchor; `feedback list --purpose <fragment>` filters by it.
    Omission stays valid — restoration is additive, not breaking.
  - help-text + migration test updated; new vitest covers add/list
    round-trip, NULL-on-omit legacy compat, substring match, and the
    help-text documentation contract.

Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md:
"non-trivial artifacts must name their purpose and consumer."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:51:52 +02:00
Mikael Hugo
3c416162b2 feat(sf-db): record per-task purpose_trace at complete_task (ADR-0000)
Restore the purpose-to-software doctrine at the slice gate: every task
the executor closes must name the slice-goal sentence or clause it
served. complete-slice now refuses to flip a slice to complete while
any of its tasks has a NULL purpose_trace, making "did all tasks
actually serve the slice goal" a mechanical check instead of a vibe.

Schema migration v70 adds a nullable purpose_trace TEXT to tasks
(legacy rows stay valid). complete_task refuses without it and quotes
slice.goal in the error so the agent can anchor. insertTask /
updateTaskStatus accept the new field, rowToTask exposes it, and a
new updateTaskPurposeTrace helper covers later corrections.

Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:50:27 +02:00
Mikael Hugo
fa657f2523 feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2)
Restoration of doctrine: plan-milestone now emits a literal milestone.vision
clause per slice (traces_vision_fragment) so validate-milestone has structured
grounds for assessment instead of re-reading the vision through the LLM every
time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and
plan_milestone tool start requiring it for new slices, rejecting fragments that
do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:48:50 +02:00
Mikael Hugo
3b83f0898b merge(P4): purpose-coherence-gate before every dispatch (ADR-0000) 2026-05-15 18:45:17 +02:00
Mikael Hugo
5e2c7a7166 merge(P1): vision quality gate on sf new-milestone (ADR-0000) 2026-05-15 18:45:08 +02:00
Mikael Hugo
59fbaf4b0f test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers
Cleans up doctor-purpose-gate.test.mjs:
- Uses insertMilestone/insertSlice helpers instead of raw SQL
- Removes redundant test from doctor-plan-dir-normalization.test.mjs
- Adds module-level JSDoc purpose comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:44:09 +02:00
Mikael Hugo
a303b5db29 feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000)
Restores the eight-PDD purpose gate at the autonomous-loop boundary
required by ADR-0000 (SF is a purpose-to-software compiler). The gate
walks milestone vision -> slice.traces_vision_fragment ->
task.purpose_trace before every dispatch and refuses to proceed when
the purpose chain is broken at the vision root (degraded-vision).

- New uok/purpose-coherence.js with a pure verdict function and a
  DB-backed adapter. Reads vision/trace columns directly via SQL so
  pre-P2/P3 schema migrations are tolerated.
- Wired into auto/phases-pre-dispatch.js alongside resource-version-
  guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on
  every pre-dispatch turn and emits to the existing trace JSONL.
- Outcome ladder: fail (vision missing -> pause loop), warn (trace
  columns missing or NULL -> surface but allow dispatch so legacy DBs
  don't hard-break on day one), pass (full chain).
- Tests in tests/uok-purpose-coherence.test.mjs cover the four
  contracted states plus the column-missing downgrade path on a
  pre-migration schema.

Refs: ADR-0000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:42:55 +02:00
Mikael Hugo
fb68b12902 feat(doctor): enforce ADR-0000 purpose gate — milestones need vision, slices need goal
Adds two new doctor checks to checkEngineHealth():

- db_milestone_missing_vision: error when a milestone has no vision
  (the WHY/purpose field per ADR-0000)
- db_slice_missing_goal: error when a slice has no goal
  (the WHAT/purpose field per ADR-0000)

Both checks are non-fixable (the operator must define purpose).
This aligns with ADR-0000 §Enforcement: "Non-trivial milestones,
slices, tasks, ADRs, specs, tests, and exported symbols must name
their purpose and consumer."

Tests: 2 cases — milestone without vision flagged, slice without
goal flagged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:42:14 +02:00
Mikael Hugo
aa0d57371e feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone
Restoration of forgotten doctrine: ADR-0000 declares the eight PDD
fields (Purpose, Consumer, Contract, Failure boundary, Evidence,
Non-goals, Invariants, Assumptions) the purpose gate, but
`sf headless new-milestone --context <file>` was accepting any
context including empty or trivially-thin seed docs. This wires a
pre-create check that refuses the run when fields are missing or
too thin, naming exactly which ones so the operator can fix the
seed doc and retry.

- new src/resources/extensions/sf/headless-pdd-check.js: scans
  context for the eight fields (heading and inline-label forms) and
  reports missing/sparse, plus a minimum-spine check (Purpose +
  Consumer + Contract + Evidence-or-Falsifier).
- src/headless.ts calls the check after loadContext, before
  bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text.
- --skip-pdd-check is the migration escape hatch (warning printed,
  PDD gate bypassed) for milestones that pre-date the gate.
- SF-internal auto-bootstrap (autonomous→new-milestone fallback)
  is exempted because the seed is SF-generated, not operator-PDD.
- vitest test covers missing-Purpose, missing-Consumer, all-8,
  sparse, inline-label form, Falsifier-as-Evidence spine, and the
  doctrine field order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:37:06 +02:00
Mikael Hugo
af1401e4ea fix(solver): enforce PDD purpose gate 2026-05-15 18:32:59 +02:00
Mikael Hugo
bb0c87fdac feat(remediation-dispatcher): M003 S04 — autonomous recovery from validation findings
Implements RemediationDispatcher that classifies verification failures
and maps them to recovery strategies:

- transient    → retry (timeout, flaky test, network)
- structural   → replan (broken import, syntax error)
- knowledge    → research (not implemented, missing context)
- infra        → escalate via self-feedback (tooling broken)

Confidence scoring:
- Single failing check + known pattern = high confidence
- Multiple failures or high retry count = lower confidence
- Configurable autoFixThreshold (default 0.6)

15 unit tests covering all 4 failure classes + confidence scoring +
threshold behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:29:45 +02:00
Mikael Hugo
a863672463 fix(state): honor completed owning requirements 2026-05-15 18:24:16 +02:00
Mikael Hugo
b08cb13c20 fix(state): requirements-complete short-circuits the planning ladder
Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05,
PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but
the milestone row was still active because its only slice was a
post-migration skipped placeholder. After the previous fix routed
all-skipped milestones to pre-planning, SF ran roadmap-meeting +
plan-milestone and wrote 3 new slices on a milestone whose
contract-level work was already done — burned ~4 LLM turns on
plausibly-adjacent but unwanted re-decomposition.

Root cause: deriveStateFromDb's milestone-completion gate consults
only slice statuses (and indirectly the milestone row's own status
field). It never reads REQUIREMENTS.md to check whether the
contract is already satisfied. The slice-based view collapsed the
real signal.

Fix:

- New parseRequirementsByMilestone(content) helper in files.js:
  parses REQUIREMENTS.md, groups entries by their `Primary owning
  milestone` field, returns Map<id, {complete, incomplete}>.

- handleAllSlicesDone now reads REQUIREMENTS.md before its
  slice-based real-work check. If a milestone has at least one
  owning requirement and zero of them are incomplete, route to
  completing-milestone with nextAction naming the requirement count
  (so the operator can see *why* the milestone is being closed
  without manually opening REQUIREMENTS.md).

- Best-effort: REQUIREMENTS.md parse failure falls through to the
  existing slice-based rule. Missing file likewise — no regression
  for projects that don't keep a requirements file.

Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI.
End-to-end verified by re-running sf headless query on dr-repo
M003: now reports phase=completing-milestone with the right
requirement-count message.

Tests: 5 new cases — all complete + slice skipped → completing,
some active → pre-planning, zero owning requirements falls through,
missing file falls through, all complete + real slice work still
completes. Existing 4 all-skipped-replan cases still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:22:28 +02:00
Mikael Hugo
6a2c61d5ee feat(halt-self-feedback): M003 S03 — HaltWatchdog self-feedback integration
T01: Added integration test auto-halt-self-feedback.test.mjs that proves:
  - HaltWatchdog.check() creates a self-feedback DB entry with
    kind=runaway-loop:idle-halt, severity=high, blocking=true
  - Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated
  - Deduplication works (one entry per idle period)
  - New heartbeat resets and creates a new entry for the next idle period

T02: Enhanced evidence string to include elapsedMs, iteration, and
thresholdMs explicitly (R003 actionable context requirement).

Tests: 36/36 pass across auto-halt-self-feedback,
auto-halt-watchdog-notify, and self-feedback-db suites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 18:19:39 +02:00
Mikael Hugo
5cd5e14160 feat(headless): surface memory auth pause 2026-05-15 18:16:08 +02:00
Mikael Hugo
90b8e7edf8 feat(headless): expose memory extraction diagnostics 2026-05-15 18:13:35 +02:00
Mikael Hugo
f00762ffdb fix(feedback): allow restamping suspect resolutions 2026-05-15 18:10:14 +02:00
Mikael Hugo
820f9aaf8e fix(memory): classify extraction failures 2026-05-15 18:06:33 +02:00
Mikael Hugo
7dfb5099c9 fix(state): all-skipped milestone routes to pre-planning, not validating
handleAllSlicesDone treated isStatusDone uniformly — "complete",
"done", AND "skipped" all counted as "milestone work is finished",
so a milestone whose only slice was skipped would advance to
phase=validating-milestone. That's wrong: a placeholder slice that
was skipped doesn't validate the milestone's success criteria, it
just clears the wedge.

Surfaced concretely in dr-repo M003 (Unified Dashboard + Pilot
Validation): I skipped the migration placeholder via the new
`sf headless skip-slice` CLI, and the next-dispatch reported
`validate-milestone M003` even though no real work had happened on
the milestone. The autonomous loop would then burn an LLM turn
running validate-milestone just to discover the obvious gap.

Fix: differentiate {complete, done} from {skipped} at the gate.
When zero slices carry real-work outcomes, route into the
pre-planning phase so the dispatcher's existing
discuss → research → plan ladder takes over. The PDD/vision is
already in the milestone row, so the planner has the purpose it
needs without operator hand-holding.

Verified end-to-end against dr-repo: `sf headless query` for M003
now reports phase=pre-planning and next dispatch
`roadmap-meeting M003` (the deep-planning entry rule fires first;
discuss/research/plan come after as artifacts land).

Tests: 4 cases — all-skipped → pre-planning, complete+skipped mix
→ validating, legacy "done" alias → validating, multiple skipped
→ pre-planning.

Resolves sf-mp73sk0m-63w88y (filed via headless feedback CLI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 18:00:21 +02:00
Mikael Hugo
881fd5e304 feat(memory,state): runtime counters for memory injection + milestone work validation
Memory injection telemetry:
- Move counter writes from auto-prompts.js to memory-store.js (where
  getRelevantMemoriesRanked/getActiveMemoriesRanked actually fire).
- Track memory_inject_count and memory_inject_chars_total via
  runtime_counters table for headless-query reporting.

State-db validation:
- handleAllSlicesDone now checks if any slice carries real work
  (status=complete/done) before routing to validation.
- Milestones with all-skipped slices route to "reassess-roadmap"
  instead of asking the operator to validate non-existent work.

SM client defense:
- Filter foreign-tenant memories from SM query responses even when
  the server returns them (defense-in-depth).

Tests updated: memory-extraction-lifecycle, sf-db-migration,
headless-query-memory-injection, sm-client, memory-tenant-gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 17:57:45 +02:00
Mikael Hugo
9abbfaada2 feat(memory-tenant-gate): add project-scoped isolation for SM cross-project recall
Closes sf-mp723nju-2cpeoc. When SM_ENABLED is on, memory retrieval from
Singularity Memory is now scoped to the current project's repoIdentity
tenant. Foreign-tenant memories are filtered client-side and the tenant
filter is sent server-side for SM servers that support it.

Key changes:
- schema v68: ADD COLUMN tenant TEXT on memories table (NULL = legacy)
- insertMemoryRow: persists tenant field on every new record
- backfillMemoryTenants / backfillMemoryTenantRows: idempotent migration
  called on session_start when SM_ENABLED is set
- querySmMemories: resolves effectiveTenantId (opts.tenant > opts.tenantId
  > SM_TENANT_ID); returns [] when no tenant resolved and crossTenant off
- SM_CROSS_TENANT_ENABLED=1 opt-in bypass with audit warning in console
- register-hooks session_start: calls backfillMemoryTenants when SM active
- 12 new tests in memory-tenant-gate.test.mjs; updated sm-client.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 17:55:26 +02:00
Mikael Hugo
ff333ae067 feat(memory): surface injection token cost in headless query
The Project Memories section is rendered into every execute-task,
plan-slice, and research-slice prompt. At 10 memories × ~200 chars
each that's ~2K chars/turn injected into the context — real cost,
no operator-visible meter.

Adds two runtime_counters (already-existing key/value store):

  memory_inject_chars_total  — cumulative section size
  memory_inject_count        — number of injections

Written by buildProjectMemoriesSection() on every render. Both
writes sit inside a try/catch so a legacy DB without
runtime_counters silently skips rather than blocking prompt build.

`sf headless query` surfaces the cumulative + derived metrics as a
new top-level `memoryInjection` block:

  {
    total_chars: 12480,
    count: 8,
    avg_chars: 1560,
    estimated_total_tokens: 3120
  }

The block is omitted entirely when count is 0 (fresh project / no
prompts rendered yet) so it doesn't clutter the snapshot.

Operators can now correlate prompt size growth against autonomous
run cost without instrumenting the LLM call sites directly. The
estimated_total_tokens is chars/4 — a rough approximation since SF
doesn't tokenise the section, intentionally documented as such.

Resolves sf-mp723yl9-rcxoeh filed via the headless feedback CLI.

Tests: 5 source-level invariants — type carries the section, query
reads counters by name, snapshot omits section on zero, write side
calls both counter functions, write is wrapped in try/catch with
documented failure-mode comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:55:14 +02:00
Mikael Hugo
671b2c8628 feat(sm-client): defense-in-depth tenant filter on SM responses
Even though querySmMemories pins tenantId in the request body sent
to the Singularity Memory server, SF used to accept whatever came
back without verifying. A misconfigured or compromised SM server
could echo memories from other tenants and SF would inject them
into the next execute-task prompt — cross-customer leak.

filterSmMemoriesToTenant() now re-checks every returned memory:

  - same-tenant memories pass through
  - foreign-tenant memories (memory.tenantId OR memory.tenant !=
    expectedTenantId) are dropped, with a one-line warning so the
    misconfigured-SM symptom is visible rather than silent
  - memories with no tenant claim at all default to allow — matches
    the local DB's "NULL tenant = legacy row" rule from schema v68
  - SM_REQUIRE_TENANT_CLAIM=true flips the legacy rule to drop
    (hard fail-closed mode for operators who want it)

Defensive guards against non-array inputs, missing expectedTenantId
(returns input unchanged so caller-side fail-open semantics are
preserved), and the dual tenantId/tenant field naming.

Tests: 8 cases — same-tenant pass-through, foreign drop, legacy
allow, strict mode drop, tenantId/tenant alias, empty/non-array
defensiveness, missing-expected pass-through, warning emission.

Resolves the cross-project tenant-leak feedback row filed via the
new headless feedback CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:49:24 +02:00
Mikael Hugo
7c3d9bd3bf fix(memory): gate SM recall by tenant scope 2026-05-15 17:46:51 +02:00
Mikael Hugo
0c7aaafa00 feat(memory): enrich execute-task memory retrieval query
Previously buildProjectMemoriesSection(`${sTitle} ${tTitle}`) sent
two short strings to the cosine ranker — too sparse for re-ranking
to do meaningful work against the static pool.

buildMemoryRetrievalQuery() (new, exported for tests) enriches the
query with:

  - slice.title + task.title       (original signal)
  - slice.goal text, front 600 chars
                                   (the WHY of the slice — usually
                                   names the memory-relevant
                                   context the title can't fit)
  - top 20 changed files from
    git diff/status                (the WHAT — what code is in
                                   play right now; lets cosine
                                   ranking promote memories whose
                                   content references those paths)

Fail-open at each source: DB closed → no goal; not a git repo →
no files; nullish title args don't poison the string. The call
site never has to handle errors.

Bounded so embedding token cost stays predictable: 600-char goal
cap, 20-file cap. Empty inputs collapse to "" so the consumer's
`if (!query.trim())` branch still picks the static fallback.

Tests: 5 cases — titles always present, non-git directory safe,
empty-input collapse, nullish-arg defensiveness, real git repo
surfaces changed file paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 17:36:15 +02:00
Mikael Hugo
362af3d6a4 fix(headless): bypass rpc for status
Some checks failed
CI / detect-changes (push) Has been cancelled
CI / docs-check (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / build (push) Has been cancelled
CI / integration-tests (push) Has been cancelled
CI / windows-portability (push) Has been cancelled
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled
CI / rtk-portability (macos, macos-15) (push) Has been cancelled
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled
2026-05-15 17:32:21 +02:00
Mikael Hugo
cf32e79578 feat(memory-embeddings): read SF_LLM_GATEWAY_KEY from env as auth.json fallback
Enables CI and containerised deployments without writing secrets to disk.
Auth.json still takes precedence when present.

- readGatewayFromAuthJson now falls back to SF_LLM_GATEWAY_KEY env var
- SF_LLM_GATEWAY_URL env var also supported for endpoint override
- Added tests for env fallback, auth.json preference, and default URL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 17:13:40 +02:00
Mikael Hugo
6214f7c86d feat(memory): add extraction diagnostics 2026-05-15 16:53:01 +02:00
Mikael Hugo
fdc4650016 feat(self-feedback-drain): filter free opencode models from triage routing
Self-feedback triage routing was including paid opencode models even
when the operator policy prefers the free tier. Add
isOpenCodeProvider() + isFreeOpenCodeModelId() and filter the
candidate list before the router scores them.

Also: cosmetic — quote style normalised by the formatter on
buildInlineFixPrompt strings and spawn options object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:24 +02:00
Mikael Hugo
3a14fe86a7 test(list-models): isolate from developer's discovery-cache
Tests were picking up the developer's real
~/.sf/agent/discovery-cache.json and seeing unexpected models in
output. Pin tests to a guaranteed-missing path via the new
_discoveryCacheFilePath option so the env they observe is solely
what the test constructs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:11 +02:00
Mikael Hugo
d8f56e6704 feat(cli): add sf key subcommand for auth.json management
Surgical read/write access to ~/.sf/agent/auth.json without touching
the file directly. All mutations go through AuthStorage so file-lock
and chmod-600 invariants are always respected.

  sf key set    <provider> <api-key>   add/rotate stored key
  sf key get    <provider>             show masked key (last 4 chars)
  sf key remove <provider> [--yes]     remove credential
  sf key list                          list all providers + status

Rationale: SF's source of truth for credentials is auth.json at
runtime — env vars are only used during initial one-time provider
setup. Rotation needs an explicit, audit-friendly path, not implicit
env-driven re-reads. Keys are never echoed in full (last 4 chars
only); remove always prompts unless --yes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:04 +02:00
Mikael Hugo
351bfad41d fix(memory): extractTranscriptFromActivity now reads custom_message entries
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Activity JSONL logs use `type: "custom_message"` with `customType: "sf-auto"`
for assistant reasoning content. The old code only checked `role === "assistant"`,
so every transcript was empty → extraction silently skipped every unit.

Fix: recognise both legacy (`role === "assistant"`) and modern
(`custom_message` with `sf-*` prefix) entry shapes. Also reads the
standalone `text` field used by custom messages.

This is why memory_processed_units had 0 rows despite 34 activity logs.

Tests: 186 files / 1994 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:13:26 +02:00
Mikael Hugo
7ba469cff1 feat(memory): add debug logging to memory extraction pipeline
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.

Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry

This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.

Tests: 185 files / 1993 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:09:36 +02:00
Mikael Hugo
ba4b2d46d9 sf snapshot: uncommitted changes after 43m inactivity 2026-05-15 15:53:19 +02:00
Mikael Hugo
0b19afebf6 test(providers): expand discovery test matrix to 46 cases
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds full coverage for the discovery-gating root cause that was
fixed in commits d70d8d3b1 (xiaomi x-api-key auth) and the
subsequent refreshSfManagedProviders + writeSdkDiscoveryCacheEntry
work in model-catalog-cache.js.

Diagnosis recap: kimi-coding, opencode, opencode-go were silent
in ~/.sf/agent/discovery-cache.json because the SDK's
model-discovery.js adapter registry marked them with
StaticDiscoveryAdapter (supportsDiscovery=false), so the SDK's
discoverModels() never attempted them. SF's own
scheduleModelCatalogRefresh DID fetch them but wrote only to the
per-repo runtime cache (basePath/.sf/model-catalog/) and only fired
on session_start — not during --discover. The fix is to mirror the
write to the SDK's discovery cache on both fetch-path AND cache-hit
path, and await it in cli.ts before listModels when --discover is set.

New test sections:
- parseDiscoveredModels: OpenAI {data}/{models} formats, Google
  {models[].name} prefix stripping, name-as-id fallback, null on
  bad input, OpenRouter pricing extraction
- refreshSfManagedProviders: xiaomi uses x-api-key (not Bearer),
  opencode uses Bearer, no-key providers skipped, SDK discovery cache
  written on BOTH network-fetch and cache-hit paths, kimi-coding +
  opencode-go iterated when keys present

46 tests pass. No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:09:38 +02:00
Mikael Hugo
67c088410c chore(discovery): silence debug stderr from refresh path
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Trailing instrumentation from the discovery investigation. The error
catch still swallows non-fatal failures during --discover, just no
longer prints to stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 15:03:56 +02:00
Mikael Hugo
fe28a48d81 fix(sift): revert to bm25,phrase for repo-root — hang was corrupted cache
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The earlier commit (44fcfb643) incorrectly disabled phrase on repo-root
because I thought phrase retriever hung on full-workspace scope. After
clearing the corrupted cache (left by killing a mid-build vector process),
testing confirms:

- bm25 alone on repo root: works, 1m 50s cold, instant warm
- phrase alone on repo root: works after cache clear
- bm25+phrase on repo root: works after cache clear
- vector on scoped paths: works after cache build

The "hang" was from a corrupted/stale cache, not a sift bug.
.siftignore is properly excluding files (146K→2,660 indexed).

Revert chooseSiftRetrievers back to bm25,phrase for repo-root.

Tests: 184 files / 1974 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 14:59:45 +02:00
Mikael Hugo
b88b66c651 feat(auto): fan out swarm research units 2026-05-15 14:54:27 +02:00