84 files spanning provider capabilities, model routing, headless
runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted
so headless auto can resume M004 (Production Readiness) S03
(Verification Gate Validation) on a clean tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds step 0a: when independent reads/greps are needed, batch them in a
single assistant turn instead of one-at-a-time. The existing step 0
already pushed for terse narration, but didn't address the bigger waste
— sequential tool calls when parallel would work. Common case: reading
handler + test + schema to triangulate a bug — three reads in one turn,
not three turns.
Also nudges away from "talking-then-doing": if the next action is
unambiguous, just take it. Describing intent before every call is the
dead weight that adds up to 30-50% extra round-trips.
Behavior fix only (prompt-level). Model can still narrate inside its
thinking channel since that's a model property; this targets the
chat/tool-use channel where the user pays per turn.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The single IDLE_TIMEOUT_MS constant was conflating two different jobs:
"are we done?" vs "is the agent stuck?". For multi-turn commands (auto,
next, discuss, plan), the first question is wrong — those signal
completion explicitly via "auto-mode stopped" terminal notifications,
and child-process exit catches crashes. The 120s I'd just bumped
multi-turn to was still in idle-detection mindset; that's not what we
need from this timer.
New semantics:
- IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle
really does mean done.
- NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with
pauses for thinking between bootstrap steps.
- MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan.
Not a "done" detector; a deadlock recovery bound. Long enough to
never bother slow LLM reasoning or chained tool calls; short enough
to recover from a true hang within a reasonable window. Real
completion comes from terminal notifications + child-process exit,
both already wired.
Code reads cleaner too: effectiveIdleTimeout selection now mirrors the
three-way conceptual split.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf
headless auto would dispatch a task, the LLM would make 1-2 tool calls,
pause to reason about the next step, exceed 15s of "no events", and
headless would declare "Status: complete" — exiting at ~35s with the task
barely started (123 events but only 2 tool calls).
The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason
("LLM may pause between tool calls e.g. after mkdir, before writing
files"). The same applies to auto/next/discuss/plan — all multi-turn
commands where the LLM thinks longer between actions, especially on
non-trivial tasks. isMultiTurnCommand was already defined for related
logic; this just wires it into the idle-timeout decision.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Companion to the earlier schema-versioning framework. Where that handles
data-shape evolution via forward migrations, this handles file-template
evolution via silent self-rewrite. The user shouldn't have to know:
- ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the
frontmatter when seeding a new project's PREFERENCES.md, recording the
sf version that wrote the template.
- New module preferences-template-upgrade.ts:
- detectTemplateDrift(prefs) — pure check, returns
{ fromVersion, toVersion, needsUpgrade }.
- upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders
the file's frontmatter when fromVersion ≠ toVersion. Body (anything
after the closing `---`) is preserved verbatim, so user notes stay.
- Wired into loadPreferencesFile() — every read self-aligns. No human
warnings, no opt-in flow; sf keeps its own house in order.
- last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so
it round-trips through validatePreferences without "unknown key"
warnings.
Failure modes are non-fatal: missing file, malformed frontmatter, or
read-only filesystem all leave the file alone and return the in-memory
prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of
truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry
points don't stamp incorrect values.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's
.gitignore on every sf run, causing two issues:
1. Working-tree churn — every invocation dirtied .gitignore, forcing a
commit just to silence "uncommitted changes" warnings. Pattern flagged
by user: "is this the right way with its own every run".
2. False-positive duplicate-add — the literal-string check
(`existingLines.has(".sf")`) didn't recognize user-equivalent patterns
like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit
user entry got duplicated by the auto-add on next run.
Fix: move sf-specific runtime patterns to `.git/info/exclude` via new
`ensureGitInfoExclude()`. That file is per-clone (not committed), so
re-writing is invisible to git status. The project's `.gitignore` stays
human-curated and sf doesn't opinionate on it.
`ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers
don't need to update — backwards compatible. Generic OS/IDE/lang patterns
(.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for
.gitignore since those genuinely belong in version control.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the framework for evolving the prefs schema without silently breaking
projects pinned to older versions. Each PREFERENCES.md declares `version: N`;
sf declares CURRENT_PREFERENCES_SCHEMA_VERSION in code. On load:
- prefs.version === current → no-op
- prefs.version < current → run registered migrations in chain (forward only,
pure functions). Missing migration in the chain throws — bumping the
schema version requires a matching Migration entry, by construction.
- prefs.version > current → warn "prefs from a newer sf, fields may be
ignored", preserve the value so a later upgrade reads correctly.
- prefs.version undefined → assume v1 (legacy file pre-versioning) and
warn so the user adds an explicit pin.
Migration registry is empty for now (current schema version stays at 1) —
the framework is in place so the first real schema bump is a one-line
addition, not a refactor. Drift detection (`checkPreferencesDrift`) is also
the natural surface for future deprecated-key / missing-required-field
checks when CLAUDE.md / template comparisons are added.
Wired into validatePreferences() so every load path gets the new behavior
automatically — no caller changes needed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without this, every fresh project inherits sf's user-level dogfooding
defaults (npm run typecheck:extensions, test:sf-light) — which run sf's
own dev scripts against unrelated repos and produce universal false
negatives. Hit in dr-repo (Go): T01-VERIFY.json showed all_fail because
those npm scripts don't exist there, even though T01's actual work passed
verification per its SUMMARY.
- ensurePreferences() now calls detectProjectSignals() and embeds the
auto-detected commands in the YAML frontmatter on first init. Detection
failure is non-fatal — falls back to the bare template.
- detectVerificationCommands() Go branch now handles multi-module repos
(no root go.mod, only nested ones — common pattern for repos like
dr-repo/{dr-agent,portal,gateway,installer,cmd/installer}). Generates
a per-module loop instead of running go vet/test from the repo root,
which would fail since each subdir is its own Go module.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Milestone-end workflow that compares declared product intent (VISION.md,
RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and
emits structured gaps with severity. Soft gates — adds follow-up slices
but doesn't hard-block merge.
Slim port (4 new files + 1 registration) — extracts only the audit
feature itself, not bunker's parallel rewrite of dispatch/prompts/
benchmark-selector that came with it in commit 2aa785475.
Created:
- prompts/product-audit.md — prompt verbatim, gsd_*→sf_* and .gsd→.sf
- tools/product-audit-tool.ts — slim file-write implementation,
atomicWriteAsync to .sf/active/{mid}/
PRODUCT-AUDIT.{json,md}; no DB deps
- bootstrap/product-audit-tool.ts — pi-coding-agent tool registration,
TypeBox schema for sf_product_audit
- workflow-templates/product-audit.md — workflow template
Modified:
- bootstrap/register-extension.ts — 2 lines: import + add to nonCriticalRegistrations
- workflow-templates/registry.json — registry entry
- package.json — version 2.75.0 → 2.75.1
Verdict logic (no-gaps | gaps-found | contract-underspecified) is the
load-bearing innovation: contract-underspecified forces the auditor to
flag unverifiable docs as a real gap rather than rubber-stamping
no-gaps when the product contract is silent.
Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase
machine is a follow-up; the prompt + tool + template stand alone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three small UX fixes for headless / autopilot logs:
1. Add `zz-notifications` to TUI_FOOTER_STATUS_KEYS — these are sticky
notification dots from the interactive TUI footer; they have no
meaning in headless and were spamming the log.
2. Categorize notification messages by prefix so headless output is
scannable: [mcp] for MCP-client-ready, [search] for web search status,
[parallel] for slice-parallel/subagent dispatch. Falls through to
the existing important/non-important formatting for everything else.
3. Distinguish phase transitions from generic status updates: phase:/
milestone:/slice:/task: prefixed keys get [phase]; everything else
gets [status]. Previously both used [phase], which was misleading.
Patterns based on bunker commits 14ec4d97f / c15afb45f (which were the
research source) but written fresh against our existing
TUI_FOOTER_STATUS_KEYS structure rather than cherry-picked.
The assistant-text-preview commit (cf0274c63) is a separate, larger
refactor in headless.ts and is deferred to v3.1.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
We have .serena/ configured (cache, memories, project.local.yml) but no
prompt mentioned Serena anywhere. Agents weren't using it for symbol
lookup or cross-file architecture mapping; they fell straight to rg/find.
Added a one-sentence Serena hint to the code-exploration step in:
- research-slice.md
- research-milestone.md
- plan-slice.md
- plan-milestone.md
- guided-research-slice.md
Phrased generically ("If a repo-intelligence MCP (e.g. Serena) is
configured...") so it degrades cleanly when Serena isn't set up.
Pattern based on bunker commit 4ba746888 but written fresh against our
post-rename prompt structure rather than cherry-picked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.
Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries
The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.
The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codex-rescue output (a299c461 / bnr88iy59) — the 'Git merge approved once'
followed seconds later by 'Git merge declined by user' bug we hit on
M002 complete-milestone. Same gate, same agent run, opposite verdicts.
Single source of truth for the merge-gate state in guardrails/index.ts.
Approval is now sticky — re-asks return the cached approval until consumed
or explicitly revoked, never auto-flip to decline. Timeout converts to
pause+log instead of decline. Adds tests/safe-git-merge-gate.test.ts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
Two codex-rescue tasks landed together:
1. Auto-coerce JSON-schema validator: when a tool field declares
{type:"array", items:{type:"string"}} and the model sends a single
string, wrap it in [string] before validation instead of hard-rejecting.
Fixes the recurring "keyDecisions: must be array" rejection on
sf_complete_task that wasted retries.
2. Provider_model_allow filter (proper implementation with helpers):
- resolveProviderModelAllowList / isProviderModelAllowed /
filterModelsByProviderModelAllow helpers in preferences-models
- Wired into model-registry and auto-model-selection
- New tests/provider-model-allow.test.ts
Tools coerced: sf_complete_task, sf_complete_milestone, sf_plan_milestone,
sf_plan_slice, sf_replan_slice, sf_reassess_roadmap (key list fields).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
Cherry-pick of gsd-build/gsd-2 65ca5aa2e — applies the security hardening
hunks that conflicted minimally:
- mcp-server/env-writer: validate writes against a strict allowlist
- web/api/files: enforce path containment via web/lib/secure-path
- vscode-extension: read binaryPath/autoStart only from trusted
global/default scopes (resolveTrustedSfStartupConfig), avoiding
workspace-controlled override (renamed Gsd → Sf for sf naming)
- New regression tests: mcp-client-security, vscode-startup-security,
web-files-symlink
Skipped hunks (drifted): mcp-server/server.ts, mcp-client/index.ts,
mcp-server/README.md.
Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 a09e01640 — withFileLockSync now actually
acquires a proper-lockfile (was previously a no-op when proper-lockfile
wasn't required) and throws on ELOCKED contention by default. Adds
onLocked: 'skip' option for best-effort callers that tolerate dropped
entries (audit, journal). Modernizes import style (createRequire/join
from imports rather than ad-hoc require). Path-renames preserved
(gsd-pi → sf-run).
Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 53babec29 — lock-wrapped append half.
Wraps appends to .sf/journal/, .sf/audit/events.jsonl, and the
workflow-logger error log in withFileLockSync (onLocked: skip),
preserving best-effort semantics while preventing torn writes
under contention.
Companion to the atomic-write half landed in 3df56cb94. Path-renames
(gsdRoot→sfRoot, gsd-db→sf-db) preserved during conflict resolution.
Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 9340f1e9b (#4423) — doctor self-heal
detection for symlinked staging directories that can cause silent
data loss. Skips native-git-bridge.ts and git-service test (drifted).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 a4f78731f — handles worktree context fallback
and sanitizes paths in paused session resumption. Skips uok-plan-v2-wiring
test hunk (drifted in sf).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 851507913 (#4056) — defensive parsing
so a corrupt or non-array tasks blob in a milestone row doesn't crash
sf-db reads. Test hunk skipped (sf-db.test.ts has drifted).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cherry-pick of gsd-build/gsd-2 53babec29 (Jeremy <jeremy@fluxlabs.net>)
— atomic-write half only. Eliminates torn-write risk on PROJECT.md
queue sync and reports.json/HTML index regeneration by switching
writeFileSync → atomicWriteSync (tmp+rename).
The companion lock-wrapped-append changes (journal.ts, uok/audit.ts,
workflow-logger.ts) are deferred — they need proper-lockfile +
withFileLockSync helper introduced first.
Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generalize the code-intelligence hook to support multiple indexer
backends, with sift (rupurt/sift) as a new option next to the existing
project-rag MCP server. Backend is selected via CodebaseMapPreferences.
- code-intelligence.ts: new abstraction + sift backend (detect, resolve,
status, context-block contribution)
- preferences-types.ts: codebaseIndexer field (project-rag | sift | none)
- preferences-validation.ts: validate the new field
- bootstrap/system-context.ts, commands-codebase.ts: dispatch on backend
- tests/code-intelligence.test.ts: sift detection/resolution/status tests
(19 pass, 0 fail)
project-rag path unchanged and continues to work.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SubagentBackgroundJobManager tracks long-running subagent jobs with
status, abort support, and TTL-based eviction of completed results.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Human-oriented documentation of SF capabilities, with a script that
keeps it in sync with workflow-tools.ts and extension manifests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracting a class method as a bare reference loses its 'this' context,
causing 'Cannot read properties of undefined' when minimax (or any
provider) triggers the flat-rate auth-mode lookup.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
buildExecuteTaskPrompt was not passing memoriesSection to loadPrompt,
causing headless auto to fail with a template variable error. Also
updated plan-slice-prompt.test.ts to supply the four template variables
(memoriesSection, runtimeContext, phaseAnchorSection, gatesToClose) that
were missing from the test fixture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The resolver guarded on context.parentURL.includes('/src/') to identify
in-repo source files, but @google/gemini-cli-core installs to
node_modules/@google/gemini-cli-core/dist/src/ which also contains '/src/'.
Relative imports from that dist package (e.g. './config/config.js') were
incorrectly rewritten to './config/config.ts', causing ERR_MODULE_NOT_FOUND
on every test that transitively imports the google-gemini provider.
Fix: add !context.parentURL.includes('/node_modules/') guard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>