The flow-audit repeated-milestone-failure rollup now includes:
- Active milestone/unit and session pointer (AC1)
- Stale dispatched units (AC2)
- Runaway history (AC3)
- Over-budget child processes (AC3)
This satisfies the acceptance criteria of self-feedback entry
sf-mp3ati7u-qqxcyi so operators can use the rollup evidence to
repair stale dispatch, missing summary, runaway, or child-process
handling without needing to re-run the flow audit manually.
Refs: sf-mp3ati7u-qqxcyi
- sf-db-schema.js: per-migration transaction boundaries (runMigrationStep)
so a late migration failure does not roll back earlier successful ones.
Post-migration assertion recreates routing_history if missing.
- routing-history.js: catch missing routing_history table at init and latch
_dbTableAvailable=false so auto-start does not crash.
- autonomous-solver.js: sticky identity guard in appendAutonomousSolverCheckpoint
pins to orchestrator's unitType/unitId instead of trusting agent's claim.
Emit journal event on identity mismatch. Record mismatchedIdentity diagnostic.
Hard cap MAX_CHECKPOINTS_PER_ITERATION=5 in assessAutonomousSolverTurn.
- Tests: add v52 DB smoke test with auto-start path; add sticky identity
tests (4 cases); add excessive-checkpoint pause test.
Fixes: sf-mp36kfqm-rjrzju, sf-mp37kjmo-1mfuru
Split reorderForCaching into a structured reorderAndSplitForCaching that
returns {before, after} at the semi-static→dynamic section boundary.
- prompt-ordering.js: export reorderAndSplitForCaching — returns null if no
dynamic sections, otherwise {before: static+semi-static, after: dynamic}
- auto.js: import and wire reorderAndSplitForCaching into deps
- phases-unit.js: use split function; pass promptParts to runUnit when split
succeeds; fall back to flat reorderForCaching when null
- run-unit.js: when promptParts is present, send a two-block content array
[{type:text, text:before, cache_control:{type:ephemeral}}, {type:text, text:after}]
so Anthropic-compatible providers cache the stable prefix
- openai-completions.ts: preserve cache_control on text parts in convertMessages;
skip maybeAddOpenRouterAnthropicCacheControl if any part already has cache_control
Tests: 5 new contract tests for reorderAndSplitForCaching; all 4502 unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrate buildPlanMilestonePrompt, buildValidateMilestonePrompt,
buildCompleteMilestonePrompt, buildReplanSlicePrompt,
buildResearchSlicePrompt, and renderSlicePrompt (plan-slice +
refine-slice) from imperative inlined[] push loops to the v2
composeUnitContext API (manifest-driven, prepend/computed support).
Changes:
- unit-context-manifest.js: add 7 new ARTIFACT_KEYS (slice-summaries,
blocker-summaries, queue, verification-classes, outstanding-items,
previous-validation, prior-milestone-summary); update 7 manifests
with correct prepend/inline/computed declarations
- auto-prompts.js: import composeUnitContext; migrate all 6 builders;
remove orphaned old buildValidateMilestonePrompt tail left by
partial prior edit
- tests: add auto-prompts-phase3.test.mjs with 7 contract tests
covering plan-milestone, replan-slice, validate-milestone, and
research-slice prompt generation
Pre-computation pattern: complex async logic (blocker scan, slice
aggregation, verification classes, prior validation) is computed
imperatively before composeUnitContext, then returned from
resolveArtifact. This preserves parallel execution of other artifacts.
buildPlanMilestonePrompt keeps framingBlock imperative: the framing
check wraps the composed inlinedContext rather than going inside the
composer boundary.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 1 — Fragment infrastructure:
- Add {{include:fragment-name}} support to prompt-loader.js
- fragmentsDir registered alongside promptsDir/templatesDir
- warmCache() now reads prompts/fragments/*.md with 'frg:' prefix
- Pre-resolution pass in loadPrompt() resolves {{include:}} before
the {{var}} validator (colon is outside validator regex [a-zA-Z0-9_],
so unresolved includes are caught as parse errors)
- Lazy-load fallback for fragments mirrors existing prompt lazy-load
- Create prompts/fragments/working-directory.md (Variant A: full
contract including 'Do NOT cd to any other directory')
- Create prompts/fragments/working-directory-ops.md (Variant B:
ops prompts, no cd restriction)
- Replace duplicated 3-line Working Directory boilerplate in 17 prompts
with {{include:working-directory}} (12 files) or
{{include:working-directory-ops}} (5 ops files)
- One fix to Working Directory wording now propagates to all 17 prompts
Phase 2 — RFC #4782 stub manifests:
- Add deploy, smoke-production, release, rollback, challenge to
KNOWN_UNIT_TYPES and UNIT_MANIFESTS in unit-context-manifest.js
- All 5 builders already called composeInlinedContext() but returned ""
because resolveManifest() found no entry; now they return live content
- All 26 unit types now have manifests (resolveManifest returns non-null
for every type in KNOWN_UNIT_TYPES)
Tests:
- 5 new tests in prompt-loader-fragments.test.mjs (include resolution,
lazy-load fallback, unknown fragment error, nested var inheritance,
variant-B fragment)
- Full unit suite: 427 files passed, 4476 tests passed, 0 regressions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In headless mode the showConfirm dialog blocks forever since there is
no TUI to answer it. The user already consented by calling /next or
/autonomous explicitly — the gate adds no value and hangs the run.
Add process.env.SF_HEADLESS !== '1' to the gate condition so headless
runs bypass it and proceed directly to autonomous execution.
Verified: `sf headless --command next` now completes slice S03
(719 526 tokens, 10 tool calls, $0.027) without hanging.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The log message said '/sf ${command}' but the actual command sent is
'/${command}' (without the sf namespace). Fix to match actual dispatch.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
headless.ts was sending `/sf {subcommand} {args}` to the RPC session, but
commands are registered without the sf namespace (e.g. 'todo', 'autonomous').
_tryExecuteExtensionCommand parsed commandName='sf', found no match, and the
LLM handled the request instead of the typed backend.
Fix: send `/{subcommand} {args}` directly — matches what registerSFCommands
registers and what the TUI already uses.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add profile-aware scaffold system so SF does not lay down irrelevant
templates in infra/ops/docs repos.
## What ships
Phase 1 — data model
- scaffold-versioning.js: add 'disabled' to VALID_STATES; readScaffoldManifest
returns profile field; recordScaffoldApply preserves manifest.profile (fixes
roundtrip bug where profile was stripped on every write).
- scaffold-constants.js: PROFILES (app/library/infra/docs/minimal as Set<string>)
and PROFILE_NAMES exports.
Phase 2 — profile-aware drift detection
- scaffold-drift.js: disabled bucket in emptyCounts, resolveActiveProfileSet
integration, profile param on detectScaffoldDrift/migrateLegacyScaffold.
- doc-checker.js: filter to active profile, skip disabled-state files.
Phase 3 — auto-detection on first run
- scaffold-profiles.js: detectRepoProfile() heuristics (nix→infra,
terraform→infra, react→app, node-no-ui→library, docs-only→docs, else→app).
- agentic-docs-scaffold.js: reads profile from manifest, auto-detects on first
run, persists to manifest, filters SCAFFOLD_FILES to active profile.
Phase 4 — migrate command
- commands-scaffold-migrate.js: sf scaffold migrate --profile <name>
Re-enables pending files entering the new profile; stamps state=disabled
(or prunes with --prune) files leaving it; warns on editing/completed files.
- commands/handlers/ops.js, commands/catalog.js: registered and tab-completed.
Phase 5 — custom profiles + PREFERENCES.md frontmatter
- scaffold-profiles.js: readPreferencesProfile(), loadCustomProfileSet()
(~/.sf/profiles/<name>.yaml with extends/add/remove), resolveActiveProfileSet()
implementing full ADR-022 §6 precedence.
- All callers updated to use resolveActiveProfileSet as the single source of truth.
Tests: 28 new tests in adr-022-scaffold-profiles.test.mjs — all passing.
Pre-existing node:test stubs (3 files) unaffected.
ADR: docs/dev/ADR-022-scaffold-profiles.md
Misc: triage TODO.md dump into BACKLOG.md (phases-helpers export error T1,
/todo triage typed-handler gap T1, structured triage tiers T2, sha-track
markdown files T2, cross-repo triage T3). Reset TODO.md to empty template.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sf-wiki is a built-in read-only skill — its page name defaults must
stay generic (lowercase). The uppercase convention is this repo's
project-level choice, documented in system.md and the wiki itself.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All .sf/ operational files use UPPERCASE (DECISIONS.md, KNOWLEDGE.md, etc.).
Wiki pages now follow the same convention: INDEX.md, ARCHITECTURE.md,
WORKFLOWS.md, SUBSYSTEMS.md, GLOSSARY.md.
Also updates sf-wiki SKILL.md and system.md prompt references.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- auto-bootstrap-context.js: scan .sf/wiki/*.md in collectAutoBootstrapFiles
so wiki pages load as priority context in headless autonomous bootstrap
- headless-context.ts: same fix for the TS bootstrap path
- system-context.js: loadWikiBlock already existed and was wired into
fullSystem; add .sf/wiki/ to Tier 1 escalation policy lookup sources
- system.md: add wiki/ to .sf/ directory structure; add Conventions entry
explaining wiki is tracked in git (hand edits persist) and injected
automatically when present
- git-runtime-patterns.js: do NOT gitignore .sf/wiki/ — wiki pages are
tracked like DECISIONS.md so hand edits survive commits and clones
- .sf/wiki/: seed index.md, architecture.md, workflows.md for this repo
Wiki filenames follow sf-wiki SKILL.md convention: lowercase (index.md,
architecture.md, workflows.md, subsystems.md, glossary.md).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three follow-up fixes from S03/T04:
1. gate-runner.js: add missing getDistinctGateIds import from sf-db.js.
UokGateRunner.getHealthSummary() called it when registry was empty but
it was never imported — runtime ReferenceError in headless contexts.
2. sf-db-gates.js: getDistinctGateIds + getGateRunStats fall back to the
quality_gates DB table when no trace events are found (e.g. after trace
file rotation). Ensures gate health survives trace cleanup.
3. headless-uok-status.ts: replace generic Type column with real Scope
(task/slice/milestone) from quality_gates DB, and show actual Last
Evaluated timestamp from DB even when outside the 24h stats window.
Tests updated to match (21 pass).
Closes backlog items: bl-gate-runner-import-bug, bl-gate-stats-trace-vs-db,
bl-uok-status-enrich.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a new `sf headless status uok` subcommand that queries
gate-run stats and circuit-breaker state from sf.db and formats
them as a markdown table or JSON (--json flag).
- src/headless-uok-status.ts: handler that loads sf-db-gates
directly (avoids the unimported getDistinctGateIds in gate-runner)
- src/headless.ts: bypass RPC, route 'status uok' to handler
- src/help-text.ts: document the new subcommand
- tests/headless-uok-status.test.mjs: 19 node:test coverage
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds adaptive-verification-policy.js which reads OutcomeLearningGate
trace events from the last 24h and adjusts verification_max_retries /
verification_auto_fix in project preferences:
- >60% verification/artifact/execution failures → reduce retries to 1, disable auto-fix
- 0% failures across ≥5 samples → bump retries (capped at 3)
- all other cases → no change (returns null)
Wires into auto-verification.js after OutcomeLearningGate runs when
outcomeLearning flag is enabled. Includes 12 node:test tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add checkCrossSliceConsistency() to detect key_file conflicts across slices
- Add checkMilestoneIntegrity() to verify completed slices have summaries
and no active requirements are orphaned
- Extend runPostExecutionChecks() signature with optional milestoneId
and allSliceTasks parameters
- Wire cross-slice task gathering into auto-verification.js call site
- Add comprehensive node:test suite for both new checks
Used perl regex to replace all patterns of the form
X instanceof Error ? X.message : String(X)
with getErrorMessage(X) for any variable name.
Added getErrorMessage imports to 6 files that lacked it.
Leaves only 2 intentional .stack || .message variants unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all remaining inline error ternaries using the 'error' variable name
with getErrorMessage(error). Added imports to 3 files that lacked it.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- guided-flow.js: SF-WORKFLOW.md path now uses sfHome()
- commands-config.js: both auth.json path sites use sfHome()
Eliminates the last 3 inline ~/.sf path patterns; all .sf paths
now route through sfHome() which respects SF_HOME env override
and uses the platform-safe homedir() fallback.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- commands-handlers.js: replace process.env.HOME/.sf/agent/SF-WORKFLOW.md with sfHome() at both call sites (lines 62 and 412)
- skills/directory.js: replace process.env.HOME/.sf/skills with sfHome()
- tools/tool-helpers.js: remove duplicate errorMessage implementation; re-export getErrorMessage from error-utils.js under the errorMessage alias
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of deleting these planned-extraction modules, implement them
properly:
worktree-session-state.js:
- Upgraded to canonical module with JSDoc, node:path imports
- Fixed getActiveWorktreeName() to use normalize/join/basename (was
using fragile string.replaceAll + split('/') approach)
- Fixed ensureWorktreeOriginalCwdFromPath() to use sep instead of regex
- worktree-command.js now imports/re-exports all state functions from
this module and removes its local 'let originalCwd = null'
- registerWorktreeCommand() recovery logic replaced with
ensureWorktreeOriginalCwdFromPath() call
auto-runtime-state.js:
- Fixed to use getAutoSession() singleton instead of 'new AutoSession()'
(was creating an isolated instance disconnected from auto.js state)
- auto.js now re-exports isAutoActive, isAutoPaused, markToolStart,
markToolEnd from this module, removing duplicate implementations
- All state reads in auto-runtime-state.js delegate to the same
singleton that auto.js manages
Test: updated worktree-fixes.test.mjs guard to match clearWorktreeOriginalCwd()
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- worktree-session-state.js: planned extraction for worktree originalCwd
state; worktree-command.js kept its own module-level var and never
imported this file. Dead since creation in 47c806d73.
- auto-runtime-state.js: planned extraction of isAutoActive/isAutoPaused
and AutoSession wrapper; auto.js already exports all the same functions.
No file in the codebase imported auto-runtime-state.js.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
preferences.js had its own copy of sfHome() (without resolve() canonicalization).
Replace with import from sf-home.js — single source of truth.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- rf2-01: replace 23 inline `process.env.SF_HOME || join(homedir(), '.sf')` patterns
across 19 files with canonical `sfHome()` from sf-home.js; removes 5 private
sfHome/getSfHome function definitions and unused os/homedir imports
- rf2-05: extract `ensureWritableParent` and `errorMessage` from complete-task.js
and complete-slice.js into new tools/tool-helpers.js
- rf2-06: add `runPostMutationHook` to tool-helpers.js; replace 8 identical
try/catch blocks (plan-task, plan-slice, plan-milestone, replan-slice,
reassess-roadmap, reopen-slice, reopen-task, reopen-milestone) with single call
- rf2-09: add `makeDiskCounter` factory in auto-dispatch.js; consolidate 4 counter
functions (rewrite/uat get/set/increment) from duplicated if/else DB-vs-disk
logic into thin factory wrappers (~35 lines removed)
- rf2-10: export `getSfAgentSettingsPath()` from preferences.js; update
notifications/notify.js and permissions/permission-core.js to use it
All 4375 unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- rf-09: Remove isTransientNetworkError from preferences-models.js/preferences.js/preferences-models.d.ts (canonical is error-classifier.js)
- rf-08: Extract Gemini token counting to google-gemini-token-counter.js; update register-hooks.js import
- rf-12: Remove 3 dead _allRequirements/_allDecisions fetch blocks from db-writer.js
- rf-05: Extract resolveSfBin() and monitorNdjsonStdout() to spawn-worker.js; both orchestrators now import from there
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete ghost package packages/pi-agent-core (no dist, no consumers,
TS build errors; JS source sf-db.js had 3 commits not mirrored in TS)
- Remove build:pi-agent-core from root package.json build:pi pipeline
- Merge all models from MODEL_COST_PER_1K_INPUT into BUNDLED_COST_TABLE
(model-cost-table.js is now the single canonical cost source)
- Remove duplicate MODEL_COST_PER_1K_INPUT object and getModelCost()
from model-router.js; use lookupModelCost() from model-cost-table.js
- Replace hand-rolled isTransientNetworkError in preferences-models.js
with delegation to classifyError() in error-classifier.js
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'read SUMMARY → check if readable AND terminal' pattern appeared five
times in state.js after the Cluster F polarity fix. Extract it to a
private loadTerminalSummary(summaryFile, loadFn) helper so the fail-closed
semantics live in one place and can't drift between call sites.
- loadTerminalSummary returns the content if readable AND terminal, null otherwise
- All 5 call sites replaced: 2 in getActiveMilestoneId(), 3 in _deriveStateImpl()
- Phase 2 'no roadmap' case reuses returned content for parseSummary().title
- isTerminalMilestoneSummaryContent now only referenced inside the helper
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No interface exists for the class, so the Impl suffix is vestigial
Java-style naming. Rename throughout: git-service.js, auto-start.js,
auto.js, worktree.js, worktree-detect.js, worktree-resolver.js,
quick.js, and the two test files that imported it directly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fail-open bugs allowed unreadable (null) SUMMARY files to be treated as
terminal, incorrectly marking milestones as complete when the content could not
be read.
Gap 1 — dispatch-guard.js line 50:
Any SUMMARY file existence = milestone complete (fail-open).
Fix: DB-first check via getMilestone()+isClosedStatus(); filesystem fallback
reads SUMMARY content and calls classifyMilestoneSummaryContent() so only
non-failure summaries skip the milestone.
Gap 2 — state.js getActiveMilestoneId():
'if (summaryFile) continue' skipped any milestone with ANY SUMMARY.
'if (!summaryFile) return mid' fell through incorrectly for failure SUMMARYs.
Fix: read content; only skip/continue if sc != null && isTerminal(sc).
Gap 3 — state.js _deriveStateImpl() Phase 1 + Phase 2:
'!sc || isTerminalMilestoneSummaryContent(sc)' — null content = fail-open.
Fix: 'sc && isTerminalMilestoneSummaryContent(sc)' — null content = fail-closed.
Applied to all 6 occurrences (lines 1233, 1247, 1257, 1284, 1356, 1391).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fold sf-usage-bar, sf-notify, sf-inturn-guard, sf-permissions,
slash-commands into sf extension (ui/, notifications/, guards/,
permissions/, commands/legacy/)
- Delete vectordrive extension
- Migrate uok/kernel.js to TypeScript (kernel.ts) with full interfaces
- Add allowJs/checkJs:false to tsconfig.resources.json for incremental TS migration
- Add symlink dedup to extension-discovery.ts (seenRealPaths Set)
- Add before_provider_request delegate back to native-search.js so
session budget tests exercise the middleware end-to-end
- Fix parseSfNativeTools() to return all SF manifest tools (drop sf_ filter)
- Fix test assertions: plan_milestone/complete_task/validate_milestone
- Remove subagent from app-smoke.test.ts (folded into sf/subagent/)
- Remove sf-permissions/sf-inturn-guard/subagent from features-inventory test
- Fix resolveSearchProvider autonomous mode test to pass 'auto' explicitly
- Remove legacy /clear slash command (conflicts with built-in clear_terminal)
- Update web-command-parity-contract.test.ts for clear removal
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- preferences-models.js: replace 6-regex isHeavyModelId() with MODEL_CAPABILITY_TIER
lookup + regex fallback for unknown models; new models in model-router.js
are automatically reflected without touching preferences-models.js
- search-the-web/provider.js: replace ~200-line per-provider waterfall with
PROVIDER_REGISTRY array + firstAvailable()/resolveWithFallback() helpers;
preserves Tavily→Brave→Serper→Exa→Ollama→MiniMax auto-fallback order
- sf-db.js: bump SCHEMA_VERSION 58→60 (v59 now reachable); add
frontmatter_version column to tasks table via v60 migration and CREATE
TABLE definition; wire frontmatter_version into upsertTaskPlanning() SQL
and .run() params
- task-frontmatter.js: add frontmatterVersion:1 to DEFAULT_TASK_FRONTMATTER,
add validation block in validateTaskFrontmatter(), add frontmatterVersion
mapping in taskFrontmatterFromRecord()
- sf-db-migration.test.mjs: update hardcoded version assertion 58→60
- docs/specs/sf-operating-model.md: add Planning Schema section documenting
the 3-table model (milestones/slices/tasks, their PKs, spec tables, and
ID naming conventions)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- FallbackResolver.setUnitContext() stores {unitType,unitId} from autonomous dispatch
- run-unit.js calls pi.setFallbackUnitContext() before/after each unit
- _findAnyAvailableFallback uses real unitType/unitId from context, not sentinel
- Schema v59: failure_mode column in llm_task_outcomes
- insertLlmTaskOutcome accepts failure_mode (rate_limit, quota_exhausted, auth_error)
- register-hooks.js passes event.classification.reason as failure_mode
- register-hooks.js uses real event.unitId when available
- ExtensionRuntimeActions.setFallbackUnitContext added to pi API surface
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a model fails and FallbackResolver picks a replacement, it now:
1. Fires the before_model_select hook with reason='fallback' and the
failing model's ID — the learning system records the failure outcome
and returns the best Bayesian-blended replacement from llm_task_outcomes
2. Falls back to the existing heuristic sort (reasoning + context window)
if the hook is unavailable or returns no override
Changes:
- BeforeModelSelectEvent: add optional currentModelId and reason fields
- FallbackResolver: accept emitBeforeModelSelect in constructor; make
_findAnyAvailableFallback async; fire hook before heuristic fallback
- agent-session.ts: inject lazy emitBeforeModelSelect closure into resolver
- register-hooks.js: record failure outcome when reason='fallback' before
returning selectLearnedModel result
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>