Commit graph

4377 commits

Author SHA1 Message Date
Mikael Hugo
32cfb6224b test: migrate node:test imports to vitest and stabilize timing thresholds
- Three .test.mjs files now import describe/it from vitest, matching the
  harness CLAUDE.md mandates for the SF extension suite.
- schedule-e2e local readEntries threshold raised 50ms → 100ms with a
  comment noting full-suite parallelism adds scheduler/filesystem jitter
  on dev machines (CI threshold unchanged at 200ms).
- e2e-smoke "headless new-milestone without --context" timeout raised
  10s → 30s so the exit-1 assertion isn't flaky under load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 21:30:21 +02:00
Mikael Hugo
3f2babb5d1 fix(auto): block refusing executor model temporarily to force escalation on retry
When classifyExecutorRefusal detects an executor refusal, the model is
now temporarily blocked (1-hour TTL) via the existing blocked-models
mechanism. This ensures that on retry — whether automatic or manual —
the router skips the refusing model and the tier-escalation path in
selectAndApplyModel picks a higher-tier alternative.

This satisfies AC1 of self-feedback entry sf-mp3bm6u0-2fskt8.
AC2 (refusal pattern detection) was already satisfied by the existing
apology-no-tools pattern in classifyExecutorRefusal.

Refs: sf-mp3bm6u0-2fskt8
2026-05-13 02:40:41 +02:00
Mikael Hugo
2cad6d54f4 fix(doctor): enrich flow-audit repeated-failure rollup with full diagnostic context
The flow-audit repeated-milestone-failure rollup now includes:
- Active milestone/unit and session pointer (AC1)
- Stale dispatched units (AC2)
- Runaway history (AC3)
- Over-budget child processes (AC3)

This satisfies the acceptance criteria of self-feedback entry
sf-mp3ati7u-qqxcyi so operators can use the rollup evidence to
repair stale dispatch, missing summary, runaway, or child-process
handling without needing to re-run the flow audit manually.

Refs: sf-mp3ati7u-qqxcyi
2026-05-13 02:25:29 +02:00
Mikael Hugo
65e195a9fd feat: Created draft mapping of SF patterns to ACE reference draft
SF-Task: S05/T01
2026-05-13 02:01:41 +02:00
Mikael Hugo
1ed505669b fix(sf-db,autonomous-solver): resolve schema-drift and checkpoint runaway loop
- sf-db-schema.js: per-migration transaction boundaries (runMigrationStep)
  so a late migration failure does not roll back earlier successful ones.
  Post-migration assertion recreates routing_history if missing.
- routing-history.js: catch missing routing_history table at init and latch
  _dbTableAvailable=false so auto-start does not crash.
- autonomous-solver.js: sticky identity guard in appendAutonomousSolverCheckpoint
  pins to orchestrator's unitType/unitId instead of trusting agent's claim.
  Emit journal event on identity mismatch. Record mismatchedIdentity diagnostic.
  Hard cap MAX_CHECKPOINTS_PER_ITERATION=5 in assessAutonomousSolverTurn.
- Tests: add v52 DB smoke test with auto-start path; add sticky identity
  tests (4 cases); add excessive-checkpoint pause test.

Fixes: sf-mp36kfqm-rjrzju, sf-mp37kjmo-1mfuru
2026-05-13 01:47:19 +02:00
Mikael Hugo
a49ea1da87 feat(sf/prompts): Phase 4 — cache_control breakpoints at static/dynamic boundary
Split reorderForCaching into a structured reorderAndSplitForCaching that
returns {before, after} at the semi-static→dynamic section boundary.

- prompt-ordering.js: export reorderAndSplitForCaching — returns null if no
  dynamic sections, otherwise {before: static+semi-static, after: dynamic}
- auto.js: import and wire reorderAndSplitForCaching into deps
- phases-unit.js: use split function; pass promptParts to runUnit when split
  succeeds; fall back to flat reorderForCaching when null
- run-unit.js: when promptParts is present, send a two-block content array
  [{type:text, text:before, cache_control:{type:ephemeral}}, {type:text, text:after}]
  so Anthropic-compatible providers cache the stable prefix
- openai-completions.ts: preserve cache_control on text parts in convertMessages;
  skip maybeAddOpenRouterAnthropicCacheControl if any part already has cache_control

Tests: 5 new contract tests for reorderAndSplitForCaching; all 4502 unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 01:36:22 +02:00
Mikael Hugo
3b83d09692 feat(sf/prompts): Phase 3 v2 — migrate milestone+slice builders to composeUnitContext
Migrate buildPlanMilestonePrompt, buildValidateMilestonePrompt,
buildCompleteMilestonePrompt, buildReplanSlicePrompt,
buildResearchSlicePrompt, and renderSlicePrompt (plan-slice +
refine-slice) from imperative inlined[] push loops to the v2
composeUnitContext API (manifest-driven, prepend/computed support).

Changes:
- unit-context-manifest.js: add 7 new ARTIFACT_KEYS (slice-summaries,
  blocker-summaries, queue, verification-classes, outstanding-items,
  previous-validation, prior-milestone-summary); update 7 manifests
  with correct prepend/inline/computed declarations
- auto-prompts.js: import composeUnitContext; migrate all 6 builders;
  remove orphaned old buildValidateMilestonePrompt tail left by
  partial prior edit
- tests: add auto-prompts-phase3.test.mjs with 7 contract tests
  covering plan-milestone, replan-slice, validate-milestone, and
  research-slice prompt generation

Pre-computation pattern: complex async logic (blocker scan, slice
aggregation, verification classes, prior validation) is computed
imperatively before composeUnitContext, then returned from
resolveArtifact. This preserves parallel execution of other artifacts.

buildPlanMilestonePrompt keeps framingBlock imperative: the framing
check wraps the composed inlinedContext rather than going inside the
composer boundary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 01:02:48 +02:00
Mikael Hugo
ca5d869e34 feat(prompts): fragment infrastructure + RFC #4782 stub manifests
Phase 1 — Fragment infrastructure:
- Add {{include:fragment-name}} support to prompt-loader.js
  - fragmentsDir registered alongside promptsDir/templatesDir
  - warmCache() now reads prompts/fragments/*.md with 'frg:' prefix
  - Pre-resolution pass in loadPrompt() resolves {{include:}} before
    the {{var}} validator (colon is outside validator regex [a-zA-Z0-9_],
    so unresolved includes are caught as parse errors)
  - Lazy-load fallback for fragments mirrors existing prompt lazy-load
- Create prompts/fragments/working-directory.md (Variant A: full
  contract including 'Do NOT cd to any other directory')
- Create prompts/fragments/working-directory-ops.md (Variant B:
  ops prompts, no cd restriction)
- Replace duplicated 3-line Working Directory boilerplate in 17 prompts
  with {{include:working-directory}} (12 files) or
  {{include:working-directory-ops}} (5 ops files)
- One fix to Working Directory wording now propagates to all 17 prompts

Phase 2 — RFC #4782 stub manifests:
- Add deploy, smoke-production, release, rollback, challenge to
  KNOWN_UNIT_TYPES and UNIT_MANIFESTS in unit-context-manifest.js
- All 5 builders already called composeInlinedContext() but returned ""
  because resolveManifest() found no entry; now they return live content
- All 26 unit types now have manifests (resolveManifest returns non-null
  for every type in KNOWN_UNIT_TYPES)

Tests:
- 5 new tests in prompt-loader-fragments.test.mjs (include resolution,
  lazy-load fallback, unknown fragment error, nested var inheritance,
  variant-B fragment)
- Full unit suite: 427 files passed, 4476 tests passed, 0 regressions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 00:30:19 +02:00
Mikael Hugo
55229f6604 fix(auto): split autonomous solver from executor per ADR-0079
- Lock solver model to kimi-k2.6 independent of unit-type router
- Executor prompt no longer requires checkpoint tool call
- Add dedicated solver pass that reads executor transcript and emits canonical checkpoint
- Classify executor refusals as blocker outcomes (already partially implemented)
- Classify no-op iterations (continue with zero work) as missing-checkpoint-retry
- Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment

Fixes sf-mp34nxb6-27zdx7
2026-05-12 23:55:02 +02:00
Mikael Hugo
e2f2cb7e2e feat: Create Command Behavior Verification Matrix across CLI, TUI, and…
SF-Task: S04/T01
2026-05-12 23:01:31 +02:00
Mikael Hugo
f789bf0f40 sf snapshot: uncommitted changes after 53m inactivity 2026-05-12 22:51:31 +02:00
Mikael Hugo
9a678f1449 sf snapshot: uncommitted changes after 270m inactivity 2026-05-12 21:58:31 +02:00
Mikael Hugo
93d547c65e fix(headless): skip Ask→Build mode gate in SF_HEADLESS mode
In headless mode the showConfirm dialog blocks forever since there is
no TUI to answer it. The user already consented by calling /next or
/autonomous explicitly — the gate adds no value and hangs the run.

Add process.env.SF_HEADLESS !== '1' to the gate condition so headless
runs bypass it and proceed directly to autonomous execution.

Verified: `sf headless --command next` now completes slice S03
(719 526 tokens, 10 tool calls, $0.027) without hanging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 17:28:09 +02:00
Mikael Hugo
d22df007a7 fix(headless): correct log message to show actual command format
The log message said '/sf ${command}' but the actual command sent is
'/${command}' (without the sf namespace). Fix to match actual dispatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 17:04:11 +02:00
Mikael Hugo
16db710468 sf snapshot: uncommitted changes after 49m inactivity 2026-05-12 16:45:04 +02:00
Mikael Hugo
0426aafad2 fix(headless): drop /sf prefix so typed commands route through extension dispatch
headless.ts was sending `/sf {subcommand} {args}` to the RPC session, but
commands are registered without the sf namespace (e.g. 'todo', 'autonomous').
_tryExecuteExtensionCommand parsed commandName='sf', found no match, and the
LLM handled the request instead of the typed backend.

Fix: send `/{subcommand} {args}` directly — matches what registerSFCommands
registers and what the TUI already uses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 15:55:46 +02:00
Mikael Hugo
2bb9cdbeef feat(scaffold): ADR-022 scaffold profiles (all phases)
Add profile-aware scaffold system so SF does not lay down irrelevant
templates in infra/ops/docs repos.

## What ships

Phase 1 — data model
- scaffold-versioning.js: add 'disabled' to VALID_STATES; readScaffoldManifest
  returns profile field; recordScaffoldApply preserves manifest.profile (fixes
  roundtrip bug where profile was stripped on every write).
- scaffold-constants.js: PROFILES (app/library/infra/docs/minimal as Set<string>)
  and PROFILE_NAMES exports.

Phase 2 — profile-aware drift detection
- scaffold-drift.js: disabled bucket in emptyCounts, resolveActiveProfileSet
  integration, profile param on detectScaffoldDrift/migrateLegacyScaffold.
- doc-checker.js: filter to active profile, skip disabled-state files.

Phase 3 — auto-detection on first run
- scaffold-profiles.js: detectRepoProfile() heuristics (nix→infra,
  terraform→infra, react→app, node-no-ui→library, docs-only→docs, else→app).
- agentic-docs-scaffold.js: reads profile from manifest, auto-detects on first
  run, persists to manifest, filters SCAFFOLD_FILES to active profile.

Phase 4 — migrate command
- commands-scaffold-migrate.js: sf scaffold migrate --profile <name>
  Re-enables pending files entering the new profile; stamps state=disabled
  (or prunes with --prune) files leaving it; warns on editing/completed files.
- commands/handlers/ops.js, commands/catalog.js: registered and tab-completed.

Phase 5 — custom profiles + PREFERENCES.md frontmatter
- scaffold-profiles.js: readPreferencesProfile(), loadCustomProfileSet()
  (~/.sf/profiles/<name>.yaml with extends/add/remove), resolveActiveProfileSet()
  implementing full ADR-022 §6 precedence.
- All callers updated to use resolveActiveProfileSet as the single source of truth.

Tests: 28 new tests in adr-022-scaffold-profiles.test.mjs — all passing.
Pre-existing node:test stubs (3 files) unaffected.

ADR: docs/dev/ADR-022-scaffold-profiles.md

Misc: triage TODO.md dump into BACKLOG.md (phases-helpers export error T1,
/todo triage typed-handler gap T1, structured triage tiers T2, sha-track
markdown files T2, cross-repo triage T3). Reset TODO.md to empty template.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 15:28:03 +02:00
Mikael Hugo
ad53b792fb docs(.agents): add AGENTS.md — directory map and override pattern
Documents every folder under .agents/, what it contains, and the
override-by-same-name pattern. Explains YOLO as a flag not a mode.

is globally ignored but the spec file under .agents/ must be tracked.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 23:48:36 +02:00
Mikael Hugo
4f04fb4c34 chore(.agents): keep lean — remove default mode files, no modes list
.agents/ is an override layer. Default modes (ask/build/autonomous)
and default skills come from SF's built-in config. Project files only
exist when overriding or adding something project-specific.

- Remove modes/ask.md, modes/build.md, modes/autonomous.md (defaults)
- Remove enabled.modes from manifest (nothing project-defined)
- Policies and skills stay: they are project-specific overrides

To override a mode or skill, add a file with the same name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 23:47:29 +02:00
Mikael Hugo
82d629c3ee feat(.agents): add autonomous mode; clarify yolo is a flag not a mode
- Add modes/autonomous.md — third SF mode (ask/build/autonomous).
  Describes UOK dispatch loop, bash 120s timeout, fresh-context-per-unit,
  recovery/runaway-guard, and when to use vs Build.
- Add autonomous to enabled.modes in manifest.yaml.
- Update policies/yolo.yaml description: YOLO is a flag on Build or
  Autonomous, not a mode, not a Shift+Tab stop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 23:45:24 +02:00
Mikael Hugo
8ea4b0745d fix(.agents): list all 5 skills in manifest.yaml enabled.skills
sf-wiki, forge-autonomous-runtime, forge-command-surface, nix-build,
and smoke-test are all present in .agents/skills/ and must be declared
in enabled.skills per the AGENTS-1 spec.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 20:12:37 +02:00
Mikael Hugo
a9ebfb4442 fix(skills): move sf-wiki project override to .agents/skills/ (standard location)
.agents/skills/ is the documented standard for project-level skill overrides
(docs/user-docs/skills.md). .sf/skills/ is also searched but .agents/skills/
is the ecosystem-standard path used across all compatible agents.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 20:10:21 +02:00
Mikael Hugo
f3d84cd116 .agents: adopt agentsfolder/spec v0.1 as canonical agent configuration
Some checks failed
CI / detect-changes (push) Has been cancelled
CI / docs-check (push) Has been cancelled
CI / lint (push) Has been cancelled
CI / build (push) Has been cancelled
CI / integration-tests (push) Has been cancelled
CI / windows-portability (push) Has been cancelled
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled
CI / rtk-portability (macos, macos-15) (push) Has been cancelled
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled
Replaces the fragmented (AGENTS.md + CLAUDE.md + .github/copilot-instructions.md
+ .sf/STYLE.md + .sf/PRINCIPLES.md + .sf/NON-GOALS.md) surface with a
single canonical .agents/ tree per https://github.com/agentsfolder/spec.

Structure:
  .agents/manifest.yaml         spec metadata + defaults + project info
  .agents/prompts/
    base.md                     project-agnostic base prompt
    project.md                  SF-specific: purpose-first, DB-first,
                                build pipeline, Ask/Build/YOLO model
    snippets/{style,principles,non-goals}.md
                                short pointers into .sf/{STYLE,PRINCIPLES,
                                NON-GOALS}.md for composition
  .agents/modes/{ask,build}.md  YAML front matter + human-readable body
  .agents/policies/{default-safe,yolo}.yaml
                                conservative default + YOLO override
  .agents/skills/.gitkeep       empty per spec — SF's own skills not yet
                                migrated to agentskills.io format
  .agents/scopes/.gitkeep       single-tree, no scopes yet
  .agents/profiles/.gitkeep     no overlays yet
  .agents/schemas/.gitkeep      generated by validators
  .agents/state/.gitignore      excludes state.yaml from VCS per spec

Status: spec is pre-1.0 (specVersion 0.1.0 pinned). No agent runtime
currently reads .agents/ — this is structural adoption ahead of
ecosystem support. Legacy files (AGENTS.md, CLAUDE.md, etc.) kept
during the transition; .agents/ is now the canonical surface and they
will eventually point here.

This is the reference template; centralcloud/infra, operations-memory,
oncall-mobile-android to follow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 20:04:35 +02:00
Mikael Hugo
edd0eb22ac feat(skills): add project-level sf-wiki skill override with UPPERCASE convention
.sf/skills/ is the project-local skill override directory. This override
inherits all sf-wiki defaults and adds one project-specific rule: wiki
pages use UPPERCASE filenames (INDEX.md, ARCHITECTURE.md, etc.) to match
the .sf/ operational file convention (DECISIONS.md, KNOWLEDGE.md, etc.).

The built-in src/resources/skills/sf-wiki/SKILL.md stays generic (lowercase).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:54:18 +02:00
Mikael Hugo
385cc8a18b revert(skills): restore lowercase defaults in sf-wiki SKILL.md
sf-wiki is a built-in read-only skill — its page name defaults must
stay generic (lowercase). The uppercase convention is this repo's
project-level choice, documented in system.md and the wiki itself.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:52:15 +02:00
Mikael Hugo
0d187e53d7 chore(wiki): rename wiki pages to UPPERCASE to match .sf/ convention
All .sf/ operational files use UPPERCASE (DECISIONS.md, KNOWLEDGE.md, etc.).
Wiki pages now follow the same convention: INDEX.md, ARCHITECTURE.md,
WORKFLOWS.md, SUBSYSTEMS.md, GLOSSARY.md.

Also updates sf-wiki SKILL.md and system.md prompt references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:50:06 +02:00
Mikael Hugo
eacbbaac82 TODO: simplify md-tracking — drop snapshot blob, accept mid-edit corner
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Final settled design: sha + git ref only, no DB content snapshots at
all. The mid-edit case (file observed dirty) loses the ability to
reconstruct the intermediate working-tree state, but the change-
detection signal is preserved and the operator can commit first if
intermediate fidelity matters.

Trades a corner-case fidelity loss for a much simpler schema and
no DB-vs-disk content duplication. Git remains the only version
store; the DB row is a pure "where I left off" pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:49:25 +02:00
Mikael Hugo
76923afb91 TODO: md-tracking needs a version reference, not just a content sha
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Without storing snapshots we lose the ability to diff against
"what SF last saw". The fix is hybrid: store the git commit SHA1
that contained the observed content (cheap, no DB blob), and only
fall back to a gzipped snapshot when the file was observed with
uncommitted changes (no git ref exists for that exact content).

For ".sf/-generated, untracked, in .gitignore" the right answer is
to not track them in this table at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:46:38 +02:00
Mikael Hugo
296054b1d4 TODO: drop snapshot blob from md-tracking; use git for diff source
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Per follow-up: SF generates many of these .md files itself (.sf/wiki/*,
.sf/milestones/**/*.md, docs/plans/**), so storing gzipped snapshots in
the DB would duplicate disk + git for no benefit.

Simpler design: store only the sha + meta in sf.db; compute diffs
on demand against `git show HEAD:<path>`. Naturally handles both
"working-tree edit not yet committed" and "another agent committed
while SF wasn't running".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:46:06 +02:00
Mikael Hugo
faecdc828c TODO: generalise sha-tracking from milestones to all source-of-truth .md
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Per follow-up: not just .sf/milestones/**/*.md but the broader set of
markdown files that SF (or humans) treat as authoritative — AGENTS.md,
.github/copilot-instructions.md, .sf/wiki/**, docs/adr/**,
docs/plans/**, and root-level meta files.

Explicit out-of-scope list: TODO.md (reset every cycle by triage),
CHANGELOG.md / BUILD_PLAN.md (append-only by design), vendored or
generated content. Tracking those would just be noise.

Spec includes a tracked_md_files schema, the walk/diff/surface flow,
and an honest accounting of storage cost (~40 bytes per file + optional
gzipped snapshot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:45:39 +02:00
Mikael Hugo
902be6d1de TODO: SF should sha-track milestone files and diff on change
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Captures a real bug class observed during today's session: nothing
notices when a milestone file (CONTEXT.md, ROADMAP.md, slice PLAN.md,
etc.) is edited out of band — by a human, another agent, or a git pull.
SF keeps using the cached state and drifts.

Wanted: per-file sha tracking in sf.db, diff surface on change, +
hooks for accept/reject/import/archive. Storage cost negligible.

Useful in concert with the cross-repo triage and slash-command routing
gaps already in this TODO.md — together they close most of the
"unattended SF actually works" surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:45:05 +02:00
Mikael Hugo
41b7842fd8 TODO: cross-repo triage + slash-command routing + structured tiers (redo)
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Previous commit (1fb4b9882) captured only the reset and lost my intended
additions due to a Read/Write race. Re-applying the four feature
requests from today's dogfooding session:

- Cross-repo `triage-all-repos` (real fix for the "many TODO.md files"
  surface area — single tool, per-repo SF dbs, unified read-only
  aggregation view).

- Slash-command routing fix (`/todo triage` is currently re-implemented
  by the agent's LLM, bypassing the typed backend; patches to
  commands-todo.js were silently inert).

- Structured tier/priority per triage item (today tiers exist only in
  LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for
  "promote Tier 1 items").

- Phases-helpers stale-export error that fires on every SF run; needs
  either the missing export restored or a test that catches it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:34:49 +02:00
Mikael Hugo
1fb4b98820 TODO: cross-repo triage + slash-command routing + structured tiers
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Four feature requests captured from today's dogfooding session:

- Cross-repo `triage-all-repos` (real fix for the "many TODO.md files"
  surface area — single tool, per-repo SF dbs, unified read-only
  aggregation view).

- Slash-command routing fix (`/todo triage` is currently re-implemented
  by the agent's LLM, bypassing the typed backend; patches to
  commands-todo.js were silently inert).

- Structured tier/priority per triage item (today tiers exist only in
  LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for
  "promote Tier 1 items").

- Phases-helpers stale-export error that fires on every SF run; needs
  either the missing export restored or a test that catches it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:34:07 +02:00
Mikael Hugo
b818ae2c5a docs(wiki): add subsystems.md and glossary.md wiki pages
Complete the standard wiki page set from sf-wiki SKILL.md:
- subsystems.md: table of all subsystems with path, purpose, tests
- glossary.md: project-specific terms (ADR, UOK, PDD, YOLO, wiki, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:27:01 +02:00
Mikael Hugo
e679478d1b feat(wiki): wire .sf/wiki/ as tracked context source
- auto-bootstrap-context.js: scan .sf/wiki/*.md in collectAutoBootstrapFiles
  so wiki pages load as priority context in headless autonomous bootstrap
- headless-context.ts: same fix for the TS bootstrap path
- system-context.js: loadWikiBlock already existed and was wired into
  fullSystem; add .sf/wiki/ to Tier 1 escalation policy lookup sources
- system.md: add wiki/ to .sf/ directory structure; add Conventions entry
  explaining wiki is tracked in git (hand edits persist) and injected
  automatically when present
- git-runtime-patterns.js: do NOT gitignore .sf/wiki/ — wiki pages are
  tracked like DECISIONS.md so hand edits survive commits and clones
- .sf/wiki/: seed index.md, architecture.md, workflows.md for this repo

Wiki filenames follow sf-wiki SKILL.md convention: lowercase (index.md,
architecture.md, workflows.md, subsystems.md, glossary.md).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:24:23 +02:00
Mikael Hugo
3e652a9fd6 TODO: triage should escalate Tier 1 items to real milestones
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Today's triage run confirmed the manual `/todo triage` workflow works,
but it stops at tier-listing items in BUILD_PLAN.md — doesn't scaffold
.sf/milestones/MNNN/ dirs for the Tier 1 ones. That's the gap that
needs closing for the autonomous flow to actually create milestones
from raw TODO dumps.

Also captures the non-fatal phases-helpers.js extension load error
that appeared at the top of the triage run output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:15:33 +02:00
Mikael Hugo
ca7368e5f1 fix(bash): add 120s default timeout to prevent autonomous mode hangs
- Add BUILT_IN_DEFAULT_TIMEOUT_SECS = 120 constant to bash tool
- Compute effectiveTimeout = timeout ?? resolvedDefaultTimeout so LLM
  calls without a timeout get the 120s guard automatically
- Add defaultTimeoutSeconds? to BashToolOptions for override at creation
- Dynamic bashSchemaWithDefault describes the actual default in the LLM
  tool description, improving model awareness
- Add BashSettings interface + getBashDefaultTimeoutSeconds() to
  SettingsManager so users can override or disable via settings.json
- Wire defaultTimeoutSeconds into agent-session.ts _buildRuntime()

Root cause: npx sf --help triggered npm package download, hanging for
4+ minutes without timeout, consuming entire autonomous run budget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:12:33 +02:00
Mikael Hugo
7ef58422b1 TODO: feature requests for batch backlog ingestion + probe-based resolution
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Real dogfood for the auto-triage feature: this is the unstructured dump
that the autonomous cycle should pick up and process into proper backlog
items the next time it runs. Until auto-triage is wired up, the contents
serve as a written spec for what's needed.

Two flagship features:

- Auto-triage TODO.md on each autonomous cycle. `commands-todo.js`
  already implements `/todo triage` (manual). Wire it to the autonomous
  orchestrator and skip when TODO.md == _EMPTY_TODO.

- When the LLM would ask a clarifying question, replace with parallel
  combatant + partner probes (adversarial-challenge + collaborative-
  research) and only fall back to asking a human if probes diverge AND
  interactive mode is available. This unblocks unattended
  `headless new-milestone` (the gap that blocked batch backlog
  ingestion today).

Plus five smaller items (headless milestone stall fix, bulk
import-roadmap, TTY-free plan list, hand-authorable milestone scaffold,
discoverable --answers schema) carried over from the
centralcloud-ops SF-IMPROVEMENTS.md observations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:09:26 +02:00
Mikael Hugo
4e5fc12e81 feat(sf): fix gate health — import, DB fallback, and enrich status uok
Three follow-up fixes from S03/T04:

1. gate-runner.js: add missing getDistinctGateIds import from sf-db.js.
   UokGateRunner.getHealthSummary() called it when registry was empty but
   it was never imported — runtime ReferenceError in headless contexts.

2. sf-db-gates.js: getDistinctGateIds + getGateRunStats fall back to the
   quality_gates DB table when no trace events are found (e.g. after trace
   file rotation). Ensures gate health survives trace cleanup.

3. headless-uok-status.ts: replace generic Type column with real Scope
   (task/slice/milestone) from quality_gates DB, and show actual Last
   Evaluated timestamp from DB even when outside the 24h stats window.
   Tests updated to match (21 pass).

Closes backlog items: bl-gate-runner-import-bug, bl-gate-stats-trace-vs-db,
bl-uok-status-enrich.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 18:47:42 +02:00
Mikael Hugo
797db16ae8 feat(sf): S03/T04 — add UOK gate health to sf headless status uok
Adds a new `sf headless status uok` subcommand that queries
gate-run stats and circuit-breaker state from sf.db and formats
them as a markdown table or JSON (--json flag).

- src/headless-uok-status.ts: handler that loads sf-db-gates
  directly (avoids the unimported getDistinctGateIds in gate-runner)
- src/headless.ts: bypass RPC, route 'status uok' to handler
- src/help-text.ts: document the new subcommand
- tests/headless-uok-status.test.mjs: 19 node:test coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 18:31:03 +02:00
Mikael Hugo
4132ecc1db feat(sf): S03/T03 — wire OutcomeLearningGate into adaptive verification policy
Adds adaptive-verification-policy.js which reads OutcomeLearningGate
trace events from the last 24h and adjusts verification_max_retries /
verification_auto_fix in project preferences:
- >60% verification/artifact/execution failures → reduce retries to 1, disable auto-fix
- 0% failures across ≥5 samples → bump retries (capped at 3)
- all other cases → no change (returns null)

Wires into auto-verification.js after OutcomeLearningGate runs when
outcomeLearning flag is enabled. Includes 12 node:test tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 17:40:22 +02:00
Mikael Hugo
7b225696cc feat(sf): add cross-slice and milestone integrity checks to post-execution checks
- Add checkCrossSliceConsistency() to detect key_file conflicts across slices
- Add checkMilestoneIntegrity() to verify completed slices have summaries
  and no active requirements are orphaned
- Extend runPostExecutionChecks() signature with optional milestoneId
  and allSliceTasks parameters
- Wire cross-slice task gathering into auto-verification.js call site
- Add comprehensive node:test suite for both new checks
2026-05-11 17:22:11 +02:00
Mikael Hugo
338c75fc6f refactor: complete rf-01/rf-02/rf-11 blocked todos
rf-01: add ECONNREFUSED to isTransientNetworkError in anthropic-shared.ts,
  aligning with the NETWORK_RE pattern in error-classifier.js

rf-02: add scripts/validate-model-cost-table.mjs to report coverage gaps
  and price divergence between model-cost-table.js and models.generated.ts;
  add 'validate-cost-table' script to package.json

rf-11: extract 10 pure resource-display utility functions from
  interactive-mode.ts into packages/coding-agent/src/modes/interactive/
  resource-display.ts, reducing interactive-mode.ts by ~282 lines

All 4375 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 16:45:39 +02:00
Mikael Hugo
0aaf8f2c0e refactor: split state.js into state-shared/db/legacy modules
state.js was a 2012-line monolith combining shared helpers, DB-backed
derivation, and legacy filesystem derivation. Split into four files:

- state-shared.js (114 lines): helpers used by both DB and legacy paths
  isGhostMilestone, isSliceComplete, isMilestoneComplete, isValidationTerminal,
  readMilestoneValidationVerdict, loadTerminalSummary, stripMilestonePrefix,
  canonicalMilestonePrefix, extractContextTitle

- state-db.js (841 lines): deriveStateFromDb() and its exclusive helpers
  reconcileDiskToDb, buildRegistryAndFindActive, handleNoActiveMilestone,
  handleAllSlicesDone, resolveSliceDependencies, reconcileSliceTasks,
  detectBlockers, checkReplanTrigger, checkInterruptedWork

- state-legacy.js (895 lines): _deriveStateImpl() — filesystem-only path

- state.js (228 lines): thin barrel — invalidateStateCache, getActiveMilestoneId,
  deriveState, re-exports from sub-modules

All 1195 tests pass. No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 16:25:20 +02:00
Mikael Hugo
1adc7f119c refactor(rf-06): split auto/phases.js into per-phase modules
3538-line monolith → 6 focused modules + thin barrel:
- phases-helpers.js (223 lines): shared helpers (generateMilestoneReport,
  closeoutAndStop, emitCancelledUnitEnd, maybeFireProductAudit,
  _resolveReportBasePath, recordLearningOutcomeForUnit)
- phases-dispatch.js (486 lines): runDispatch + assessUokDiagnosticsDispatchGate
- phases-guards.js (497 lines): runGuards + guard helpers
- phases-pre-dispatch.js (760 lines): runPreDispatch
- phases-unit.js (1477 lines): runUnitPhase + session timeout state
- phases-finalize.js (542 lines): runFinalize
- phases.js (13 lines): barrel re-export preserving original import surface

Removed dead runPhaseReview export (zero callers confirmed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 15:14:49 +02:00
Mikael Hugo
aa6ecce384 refactor: fix all remaining inline error ternaries across 20 files
Used perl regex to replace all patterns of the form
  X instanceof Error ? X.message : String(X)
with getErrorMessage(X) for any variable name.

Added getErrorMessage imports to 6 files that lacked it.
Leaves only 2 intentional .stack || .message variants unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 14:50:01 +02:00
Mikael Hugo
dac14043cd refactor: consolidate remaining error ternaries (error variable)
Replace all remaining inline error ternaries using the 'error' variable name
with getErrorMessage(error). Added imports to 3 files that lacked it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 14:48:28 +02:00
Mikael Hugo
04322f110a refactor: replace all inline error message ternaries with getErrorMessage()
Eliminates ~120 repetitions of `err instanceof Error ? err.message : String(err)`
across the entire extension source tree. All callers now import and use
`getErrorMessage` from the canonical `./error-utils.js`.

Files updated (56 files):
- auto.js, auto-worktree.js, auto-recovery.js, auto-dashboard.js, auto-timers.js
- auto-prompts.js, auto-start.js, auto-post-unit.js, auto-model-selection.js
- auto/phases.js, auto/loop.js, auto/infra-errors.js
- autonomous-solver-eval.js, bootstrap/agent-end-recovery.js, bootstrap/db-tools.js
- bootstrap/exec-tools.js, bootstrap/journal-tools.js, bootstrap/register-extension.js
- bootstrap/register-hooks.js, canonical-milestone-plan.js, changelog.js
- clean-root-preflight.js, code-intelligence.js, commands-add-tests.js
- commands-debug.js, commands-eval-review.js, commands-handlers.js
- commands-maintenance.js, commands-pr-branch.js, commands-scan.js, commands-ship.js
- commands-todo.js, commands-worktree.js, definition-io.js, doctor.js
- doctor-config-checks.js, doctor-engine-checks.js, ecosystem/loader.js
- eval-review-schema.js, exec-sandbox.js, execution-instruction-guard.js
- graph-context.js, hook-emitter.js, index.js, learning/runtime.js
- lifecycle-hooks.js, onboarding-state.js, orphan-worktree-sweep.js
- planning-depth.js, quick.js, scaffold-keeper.js, sf-db/sf-db-core.js
- slice-cadence.js, sm-client.js, spec-projections.js, subagent/background-jobs.js
- subagent/isolation.js, sync-scheduler.js, tools/exec-tool.js
- tools/sift-search-tool.js, tools/workflow-tool-executors.js, ui/index.js
- uok/a2a-agent-server.js, uok/auto-dispatch.js, uok/auto-unit-closeout.js
- uok/auto-verification.js, uok/chaos-monkey.js, uok/gate-runner.js
- vault-resolver.js, workflow-install.js, workflow-plugins.js, worktree-manager.js
- worktree-resolver.js

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 14:46:30 +02:00
Mikael Hugo
8a7f6de782 refactor: centralize skills directory constants in skill-discovery.js
Export SKILLS_DIR, CLAUDE_SKILLS_DIR, PI_SKILLS_DIR from skill-discovery.js
instead of repeating join(homedir(), ...) inline across 5 files.

Consumers updated:
- preferences-skills.js: replace 2 inline join(homedir()...) with SKILLS_DIR/CLAUDE_SKILLS_DIR
- skill-health.js: replace 2 inline join(homedir()...) with constants; remove homedir import
- skill-catalog.js: replace 2 inline join(homedir()...) with constants; remove homedir import
- skill-telemetry.js: replace 4 inline join(homedir()...) with constants; remove homedir import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 14:39:10 +02:00
Mikael Hugo
ec224f96ac refactor: replace all process.env.HOME/.sf patterns with sfHome()
- guided-flow.js: SF-WORKFLOW.md path now uses sfHome()
- commands-config.js: both auth.json path sites use sfHome()

Eliminates the last 3 inline ~/.sf path patterns; all .sf paths
now route through sfHome() which respects SF_HOME env override
and uses the platform-safe homedir() fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 14:34:08 +02:00