Per follow-up: SF generates many of these .md files itself (.sf/wiki/*,
.sf/milestones/**/*.md, docs/plans/**), so storing gzipped snapshots in
the DB would duplicate disk + git for no benefit.
Simpler design: store only the sha + meta in sf.db; compute diffs
on demand against `git show HEAD:<path>`. Naturally handles both
"working-tree edit not yet committed" and "another agent committed
while SF wasn't running".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per follow-up: not just .sf/milestones/**/*.md but the broader set of
markdown files that SF (or humans) treat as authoritative — AGENTS.md,
.github/copilot-instructions.md, .sf/wiki/**, docs/adr/**,
docs/plans/**, and root-level meta files.
Explicit out-of-scope list: TODO.md (reset every cycle by triage),
CHANGELOG.md / BUILD_PLAN.md (append-only by design), vendored or
generated content. Tracking those would just be noise.
Spec includes a tracked_md_files schema, the walk/diff/surface flow,
and an honest accounting of storage cost (~40 bytes per file + optional
gzipped snapshot).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures a real bug class observed during today's session: nothing
notices when a milestone file (CONTEXT.md, ROADMAP.md, slice PLAN.md,
etc.) is edited out of band — by a human, another agent, or a git pull.
SF keeps using the cached state and drifts.
Wanted: per-file sha tracking in sf.db, diff surface on change, +
hooks for accept/reject/import/archive. Storage cost negligible.
Useful in concert with the cross-repo triage and slash-command routing
gaps already in this TODO.md — together they close most of the
"unattended SF actually works" surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit (1fb4b9882) captured only the reset and lost my intended
additions due to a Read/Write race. Re-applying the four feature
requests from today's dogfooding session:
- Cross-repo `triage-all-repos` (real fix for the "many TODO.md files"
surface area — single tool, per-repo SF dbs, unified read-only
aggregation view).
- Slash-command routing fix (`/todo triage` is currently re-implemented
by the agent's LLM, bypassing the typed backend; patches to
commands-todo.js were silently inert).
- Structured tier/priority per triage item (today tiers exist only in
LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for
"promote Tier 1 items").
- Phases-helpers stale-export error that fires on every SF run; needs
either the missing export restored or a test that catches it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four feature requests captured from today's dogfooding session:
- Cross-repo `triage-all-repos` (real fix for the "many TODO.md files"
surface area — single tool, per-repo SF dbs, unified read-only
aggregation view).
- Slash-command routing fix (`/todo triage` is currently re-implemented
by the agent's LLM, bypassing the typed backend; patches to
commands-todo.js were silently inert).
- Structured tier/priority per triage item (today tiers exist only in
LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for
"promote Tier 1 items").
- Phases-helpers stale-export error that fires on every SF run; needs
either the missing export restored or a test that catches it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Complete the standard wiki page set from sf-wiki SKILL.md:
- subsystems.md: table of all subsystems with path, purpose, tests
- glossary.md: project-specific terms (ADR, UOK, PDD, YOLO, wiki, etc.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- auto-bootstrap-context.js: scan .sf/wiki/*.md in collectAutoBootstrapFiles
so wiki pages load as priority context in headless autonomous bootstrap
- headless-context.ts: same fix for the TS bootstrap path
- system-context.js: loadWikiBlock already existed and was wired into
fullSystem; add .sf/wiki/ to Tier 1 escalation policy lookup sources
- system.md: add wiki/ to .sf/ directory structure; add Conventions entry
explaining wiki is tracked in git (hand edits persist) and injected
automatically when present
- git-runtime-patterns.js: do NOT gitignore .sf/wiki/ — wiki pages are
tracked like DECISIONS.md so hand edits survive commits and clones
- .sf/wiki/: seed index.md, architecture.md, workflows.md for this repo
Wiki filenames follow sf-wiki SKILL.md convention: lowercase (index.md,
architecture.md, workflows.md, subsystems.md, glossary.md).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Today's triage run confirmed the manual `/todo triage` workflow works,
but it stops at tier-listing items in BUILD_PLAN.md — doesn't scaffold
.sf/milestones/MNNN/ dirs for the Tier 1 ones. That's the gap that
needs closing for the autonomous flow to actually create milestones
from raw TODO dumps.
Also captures the non-fatal phases-helpers.js extension load error
that appeared at the top of the triage run output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add BUILT_IN_DEFAULT_TIMEOUT_SECS = 120 constant to bash tool
- Compute effectiveTimeout = timeout ?? resolvedDefaultTimeout so LLM
calls without a timeout get the 120s guard automatically
- Add defaultTimeoutSeconds? to BashToolOptions for override at creation
- Dynamic bashSchemaWithDefault describes the actual default in the LLM
tool description, improving model awareness
- Add BashSettings interface + getBashDefaultTimeoutSeconds() to
SettingsManager so users can override or disable via settings.json
- Wire defaultTimeoutSeconds into agent-session.ts _buildRuntime()
Root cause: npx sf --help triggered npm package download, hanging for
4+ minutes without timeout, consuming entire autonomous run budget.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Real dogfood for the auto-triage feature: this is the unstructured dump
that the autonomous cycle should pick up and process into proper backlog
items the next time it runs. Until auto-triage is wired up, the contents
serve as a written spec for what's needed.
Two flagship features:
- Auto-triage TODO.md on each autonomous cycle. `commands-todo.js`
already implements `/todo triage` (manual). Wire it to the autonomous
orchestrator and skip when TODO.md == _EMPTY_TODO.
- When the LLM would ask a clarifying question, replace with parallel
combatant + partner probes (adversarial-challenge + collaborative-
research) and only fall back to asking a human if probes diverge AND
interactive mode is available. This unblocks unattended
`headless new-milestone` (the gap that blocked batch backlog
ingestion today).
Plus five smaller items (headless milestone stall fix, bulk
import-roadmap, TTY-free plan list, hand-authorable milestone scaffold,
discoverable --answers schema) carried over from the
centralcloud-ops SF-IMPROVEMENTS.md observations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-up fixes from S03/T04:
1. gate-runner.js: add missing getDistinctGateIds import from sf-db.js.
UokGateRunner.getHealthSummary() called it when registry was empty but
it was never imported — runtime ReferenceError in headless contexts.
2. sf-db-gates.js: getDistinctGateIds + getGateRunStats fall back to the
quality_gates DB table when no trace events are found (e.g. after trace
file rotation). Ensures gate health survives trace cleanup.
3. headless-uok-status.ts: replace generic Type column with real Scope
(task/slice/milestone) from quality_gates DB, and show actual Last
Evaluated timestamp from DB even when outside the 24h stats window.
Tests updated to match (21 pass).
Closes backlog items: bl-gate-runner-import-bug, bl-gate-stats-trace-vs-db,
bl-uok-status-enrich.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a new `sf headless status uok` subcommand that queries
gate-run stats and circuit-breaker state from sf.db and formats
them as a markdown table or JSON (--json flag).
- src/headless-uok-status.ts: handler that loads sf-db-gates
directly (avoids the unimported getDistinctGateIds in gate-runner)
- src/headless.ts: bypass RPC, route 'status uok' to handler
- src/help-text.ts: document the new subcommand
- tests/headless-uok-status.test.mjs: 19 node:test coverage
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds adaptive-verification-policy.js which reads OutcomeLearningGate
trace events from the last 24h and adjusts verification_max_retries /
verification_auto_fix in project preferences:
- >60% verification/artifact/execution failures → reduce retries to 1, disable auto-fix
- 0% failures across ≥5 samples → bump retries (capped at 3)
- all other cases → no change (returns null)
Wires into auto-verification.js after OutcomeLearningGate runs when
outcomeLearning flag is enabled. Includes 12 node:test tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add checkCrossSliceConsistency() to detect key_file conflicts across slices
- Add checkMilestoneIntegrity() to verify completed slices have summaries
and no active requirements are orphaned
- Extend runPostExecutionChecks() signature with optional milestoneId
and allSliceTasks parameters
- Wire cross-slice task gathering into auto-verification.js call site
- Add comprehensive node:test suite for both new checks
rf-01: add ECONNREFUSED to isTransientNetworkError in anthropic-shared.ts,
aligning with the NETWORK_RE pattern in error-classifier.js
rf-02: add scripts/validate-model-cost-table.mjs to report coverage gaps
and price divergence between model-cost-table.js and models.generated.ts;
add 'validate-cost-table' script to package.json
rf-11: extract 10 pure resource-display utility functions from
interactive-mode.ts into packages/coding-agent/src/modes/interactive/
resource-display.ts, reducing interactive-mode.ts by ~282 lines
All 4375 tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Used perl regex to replace all patterns of the form
X instanceof Error ? X.message : String(X)
with getErrorMessage(X) for any variable name.
Added getErrorMessage imports to 6 files that lacked it.
Leaves only 2 intentional .stack || .message variants unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all remaining inline error ternaries using the 'error' variable name
with getErrorMessage(error). Added imports to 3 files that lacked it.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- guided-flow.js: SF-WORKFLOW.md path now uses sfHome()
- commands-config.js: both auth.json path sites use sfHome()
Eliminates the last 3 inline ~/.sf path patterns; all .sf paths
now route through sfHome() which respects SF_HOME env override
and uses the platform-safe homedir() fallback.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- commands-handlers.js: replace process.env.HOME/.sf/agent/SF-WORKFLOW.md with sfHome() at both call sites (lines 62 and 412)
- skills/directory.js: replace process.env.HOME/.sf/skills with sfHome()
- tools/tool-helpers.js: remove duplicate errorMessage implementation; re-export getErrorMessage from error-utils.js under the errorMessage alias
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of deleting these planned-extraction modules, implement them
properly:
worktree-session-state.js:
- Upgraded to canonical module with JSDoc, node:path imports
- Fixed getActiveWorktreeName() to use normalize/join/basename (was
using fragile string.replaceAll + split('/') approach)
- Fixed ensureWorktreeOriginalCwdFromPath() to use sep instead of regex
- worktree-command.js now imports/re-exports all state functions from
this module and removes its local 'let originalCwd = null'
- registerWorktreeCommand() recovery logic replaced with
ensureWorktreeOriginalCwdFromPath() call
auto-runtime-state.js:
- Fixed to use getAutoSession() singleton instead of 'new AutoSession()'
(was creating an isolated instance disconnected from auto.js state)
- auto.js now re-exports isAutoActive, isAutoPaused, markToolStart,
markToolEnd from this module, removing duplicate implementations
- All state reads in auto-runtime-state.js delegate to the same
singleton that auto.js manages
Test: updated worktree-fixes.test.mjs guard to match clearWorktreeOriginalCwd()
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- worktree-session-state.js: planned extraction for worktree originalCwd
state; worktree-command.js kept its own module-level var and never
imported this file. Dead since creation in 47c806d73.
- auto-runtime-state.js: planned extraction of isAutoActive/isAutoPaused
and AutoSession wrapper; auto.js already exports all the same functions.
No file in the codebase imported auto-runtime-state.js.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
preferences.js had its own copy of sfHome() (without resolve() canonicalization).
Replace with import from sf-home.js — single source of truth.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- rf2-01: replace 23 inline `process.env.SF_HOME || join(homedir(), '.sf')` patterns
across 19 files with canonical `sfHome()` from sf-home.js; removes 5 private
sfHome/getSfHome function definitions and unused os/homedir imports
- rf2-05: extract `ensureWritableParent` and `errorMessage` from complete-task.js
and complete-slice.js into new tools/tool-helpers.js
- rf2-06: add `runPostMutationHook` to tool-helpers.js; replace 8 identical
try/catch blocks (plan-task, plan-slice, plan-milestone, replan-slice,
reassess-roadmap, reopen-slice, reopen-task, reopen-milestone) with single call
- rf2-09: add `makeDiskCounter` factory in auto-dispatch.js; consolidate 4 counter
functions (rewrite/uat get/set/increment) from duplicated if/else DB-vs-disk
logic into thin factory wrappers (~35 lines removed)
- rf2-10: export `getSfAgentSettingsPath()` from preferences.js; update
notifications/notify.js and permissions/permission-core.js to use it
All 4375 unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- rf-09: Remove isTransientNetworkError from preferences-models.js/preferences.js/preferences-models.d.ts (canonical is error-classifier.js)
- rf-08: Extract Gemini token counting to google-gemini-token-counter.js; update register-hooks.js import
- rf-12: Remove 3 dead _allRequirements/_allDecisions fetch blocks from db-writer.js
- rf-05: Extract resolveSfBin() and monitorNdjsonStdout() to spawn-worker.js; both orchestrators now import from there
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete ghost package packages/pi-agent-core (no dist, no consumers,
TS build errors; JS source sf-db.js had 3 commits not mirrored in TS)
- Remove build:pi-agent-core from root package.json build:pi pipeline
- Merge all models from MODEL_COST_PER_1K_INPUT into BUNDLED_COST_TABLE
(model-cost-table.js is now the single canonical cost source)
- Remove duplicate MODEL_COST_PER_1K_INPUT object and getModelCost()
from model-router.js; use lookupModelCost() from model-cost-table.js
- Replace hand-rolled isTransientNetworkError in preferences-models.js
with delegation to classifyError() in error-classifier.js
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'read SUMMARY → check if readable AND terminal' pattern appeared five
times in state.js after the Cluster F polarity fix. Extract it to a
private loadTerminalSummary(summaryFile, loadFn) helper so the fail-closed
semantics live in one place and can't drift between call sites.
- loadTerminalSummary returns the content if readable AND terminal, null otherwise
- All 5 call sites replaced: 2 in getActiveMilestoneId(), 3 in _deriveStateImpl()
- Phase 2 'no roadmap' case reuses returned content for parseSummary().title
- isTerminalMilestoneSummaryContent now only referenced inside the helper
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No interface exists for the class, so the Impl suffix is vestigial
Java-style naming. Rename throughout: git-service.js, auto-start.js,
auto.js, worktree.js, worktree-detect.js, worktree-resolver.js,
quick.js, and the two test files that imported it directly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fail-open bugs allowed unreadable (null) SUMMARY files to be treated as
terminal, incorrectly marking milestones as complete when the content could not
be read.
Gap 1 — dispatch-guard.js line 50:
Any SUMMARY file existence = milestone complete (fail-open).
Fix: DB-first check via getMilestone()+isClosedStatus(); filesystem fallback
reads SUMMARY content and calls classifyMilestoneSummaryContent() so only
non-failure summaries skip the milestone.
Gap 2 — state.js getActiveMilestoneId():
'if (summaryFile) continue' skipped any milestone with ANY SUMMARY.
'if (!summaryFile) return mid' fell through incorrectly for failure SUMMARYs.
Fix: read content; only skip/continue if sc != null && isTerminal(sc).
Gap 3 — state.js _deriveStateImpl() Phase 1 + Phase 2:
'!sc || isTerminalMilestoneSummaryContent(sc)' — null content = fail-open.
Fix: 'sc && isTerminalMilestoneSummaryContent(sc)' — null content = fail-closed.
Applied to all 6 occurrences (lines 1233, 1247, 1257, 1284, 1356, 1391).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fold sf-usage-bar, sf-notify, sf-inturn-guard, sf-permissions,
slash-commands into sf extension (ui/, notifications/, guards/,
permissions/, commands/legacy/)
- Delete vectordrive extension
- Migrate uok/kernel.js to TypeScript (kernel.ts) with full interfaces
- Add allowJs/checkJs:false to tsconfig.resources.json for incremental TS migration
- Add symlink dedup to extension-discovery.ts (seenRealPaths Set)
- Add before_provider_request delegate back to native-search.js so
session budget tests exercise the middleware end-to-end
- Fix parseSfNativeTools() to return all SF manifest tools (drop sf_ filter)
- Fix test assertions: plan_milestone/complete_task/validate_milestone
- Remove subagent from app-smoke.test.ts (folded into sf/subagent/)
- Remove sf-permissions/sf-inturn-guard/subagent from features-inventory test
- Fix resolveSearchProvider autonomous mode test to pass 'auto' explicitly
- Remove legacy /clear slash command (conflicts with built-in clear_terminal)
- Update web-command-parity-contract.test.ts for clear removal
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- preferences-models.js: replace 6-regex isHeavyModelId() with MODEL_CAPABILITY_TIER
lookup + regex fallback for unknown models; new models in model-router.js
are automatically reflected without touching preferences-models.js
- search-the-web/provider.js: replace ~200-line per-provider waterfall with
PROVIDER_REGISTRY array + firstAvailable()/resolveWithFallback() helpers;
preserves Tavily→Brave→Serper→Exa→Ollama→MiniMax auto-fallback order
- sf-db.js: bump SCHEMA_VERSION 58→60 (v59 now reachable); add
frontmatter_version column to tasks table via v60 migration and CREATE
TABLE definition; wire frontmatter_version into upsertTaskPlanning() SQL
and .run() params
- task-frontmatter.js: add frontmatterVersion:1 to DEFAULT_TASK_FRONTMATTER,
add validation block in validateTaskFrontmatter(), add frontmatterVersion
mapping in taskFrontmatterFromRecord()
- sf-db-migration.test.mjs: update hardcoded version assertion 58→60
- docs/specs/sf-operating-model.md: add Planning Schema section documenting
the 3-table model (milestones/slices/tasks, their PKs, spec tables, and
ID naming conventions)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- FallbackResolver.setUnitContext() stores {unitType,unitId} from autonomous dispatch
- run-unit.js calls pi.setFallbackUnitContext() before/after each unit
- _findAnyAvailableFallback uses real unitType/unitId from context, not sentinel
- Schema v59: failure_mode column in llm_task_outcomes
- insertLlmTaskOutcome accepts failure_mode (rate_limit, quota_exhausted, auth_error)
- register-hooks.js passes event.classification.reason as failure_mode
- register-hooks.js uses real event.unitId when available
- ExtensionRuntimeActions.setFallbackUnitContext added to pi API surface
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a model fails and FallbackResolver picks a replacement, it now:
1. Fires the before_model_select hook with reason='fallback' and the
failing model's ID — the learning system records the failure outcome
and returns the best Bayesian-blended replacement from llm_task_outcomes
2. Falls back to the existing heuristic sort (reasoning + context window)
if the hook is unavailable or returns no override
Changes:
- BeforeModelSelectEvent: add optional currentModelId and reason fields
- FallbackResolver: accept emitBeforeModelSelect in constructor; make
_findAnyAvailableFallback async; fire hook before heuristic fallback
- agent-session.ts: inject lazy emitBeforeModelSelect closure into resolver
- register-hooks.js: record failure outcome when reason='fallback' before
returning selectLearnedModel result
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add packages/coding-agent/src/utils/format.ts as the canonical source
for formatDuration, formatTokenCount, truncateWithEllipsis, sparkline,
formatDateShort, fileLink, stripAnsi, normalizeStringArray — all already
exported from @singularity-forge/coding-agent via index.ts.
- Convert shared/format-utils.js to a compatibility shim that re-exports
the 8 functions from @singularity-forge/coding-agent. All 13 importers
continue to work with no import changes required.
- Convert shared/path-display.js to a compatibility shim that re-exports
toPosixPath from @singularity-forge/coding-agent. Implementation in
packages/coding-agent/src/utils/path-display.ts was already canonical.
- shared/frontmatter.js is intentionally NOT shimmed: splitFrontmatter/
parseFrontmatterMap have a different API from the package's parseFrontmatter/
stripFrontmatter (flat-map vs {frontmatter, body} object).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create packages/coding-agent/src/core/providers/web-search-middleware.ts with
WebSearchMiddleware class: injects web_search tool, enforces session budget (#1309),
strips thinking blocks from history, and respects PREFERENCES.md search_provider.
- Wire webSearchMiddleware.applyToPayload into sdk.ts onPayload callback (before
extension hook dispatch) so injection runs as compiled TypeScript with zero
jiti-dispatch overhead.
- Export WebSearchMiddleware, webSearchMiddleware singleton, setPreferBraveResolver,
CUSTOM_SEARCH_TOOL_NAMES, MAX_NATIVE_SEARCHES_PER_SESSION, and stripThinkingFromHistory
from @singularity-forge/coding-agent so the extension can delegate to the same instance.
- Refactor search-the-web/native-search.js: remove self-contained injection logic;
import and delegate before_provider_request to webSearchMiddleware singleton.
Use tri-state isAnthropicProvider (null/false/true) to synthesize a provider hint
when event.model is absent but model_select has already fired — prevents the
model-name heuristic from wrongly injecting into Copilot claude-* requests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>