84 files spanning provider capabilities, model routing, headless
runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted
so headless auto can resume M004 (Production Readiness) S03
(Verification Gate Validation) on a clean tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caveman skill (output compression) installed at ~/.claude/skills/caveman/
and activated for dr-repo. Two follow-ups for INPUT-side compression
remain — sf's own prompts are verbose (execute-task alone has 10-step
instructions, runtime context, multiple inlined plans), and that's paid
on every dispatch:
- Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in
caveman style. Preserve intent + nuance, drop fluff. Compare against
current to confirm no quality regression.
- Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt
through caveman-compress (sub-skill, ~46% reduction) before dispatch.
Behind a terse_prompts: true flag. Adds drift risk vs authored intent;
needs comparison harness.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds step 0a: when independent reads/greps are needed, batch them in a
single assistant turn instead of one-at-a-time. The existing step 0
already pushed for terse narration, but didn't address the bigger waste
— sequential tool calls when parallel would work. Common case: reading
handler + test + schema to triangulate a bug — three reads in one turn,
not three turns.
Also nudges away from "talking-then-doing": if the next action is
unambiguous, just take it. Describing intent before every call is the
dead weight that adds up to 30-50% extra round-trips.
Behavior fix only (prompt-level). Model can still narrate inside its
thinking channel since that's a model property; this targets the
chat/tool-use channel where the user pays per turn.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The single IDLE_TIMEOUT_MS constant was conflating two different jobs:
"are we done?" vs "is the agent stuck?". For multi-turn commands (auto,
next, discuss, plan), the first question is wrong — those signal
completion explicitly via "auto-mode stopped" terminal notifications,
and child-process exit catches crashes. The 120s I'd just bumped
multi-turn to was still in idle-detection mindset; that's not what we
need from this timer.
New semantics:
- IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle
really does mean done.
- NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with
pauses for thinking between bootstrap steps.
- MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan.
Not a "done" detector; a deadlock recovery bound. Long enough to
never bother slow LLM reasoning or chained tool calls; short enough
to recover from a true hang within a reasonable window. Real
completion comes from terminal notifications + child-process exit,
both already wired.
Code reads cleaner too: effectiveIdleTimeout selection now mirrors the
three-way conceptual split.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf
headless auto would dispatch a task, the LLM would make 1-2 tool calls,
pause to reason about the next step, exceed 15s of "no events", and
headless would declare "Status: complete" — exiting at ~35s with the task
barely started (123 events but only 2 tool calls).
The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason
("LLM may pause between tool calls e.g. after mkdir, before writing
files"). The same applies to auto/next/discuss/plan — all multi-turn
commands where the LLM thinks longer between actions, especially on
non-trivial tasks. isMultiTurnCommand was already defined for related
logic; this just wires it into the idle-timeout decision.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bun was the wrong runtime for our environment, two ways:
1. bun doesn't ship node:sqlite. sf-db.ts falls back through node:sqlite
→ better-sqlite3 → null. Result: 'No SQLite provider available' and
degraded-mode filesystem-state derivation, even though sqlite is
actually available (node:sqlite under node, bun:sqlite under bun —
both valid, but our code only knows the node names).
2. bun's loader doesn't inherit the system library search path under
Nix. libz.so.1 isn't found for forge_engine.node, so the native
addon falls through to JS implementations (slower).
Both warnings ("Native addon not available", "DB unavailable —
degraded mode") were the symptom of "we're running under bun".
Fix: use node + the existing src/resources/extensions/sf/tests/
resolve-ts.mjs loader hook (which already handles .js → .ts
import-specifier remapping for runtime resolution) +
--experimental-strip-types (node 22+, native in 24).
Result: from-source via node loads cleanly. No native warning.
No sqlite warning. No degraded mode. Exec: `./bin/sf-from-source
--print "..."` returns the model output and nothing else.
Drops the LD_LIBRARY_PATH zlib-injection hack that was added in
4912f6ea8 — that was working around the bun native-loader issue
that doesn't exist under node.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Companion to the earlier schema-versioning framework. Where that handles
data-shape evolution via forward migrations, this handles file-template
evolution via silent self-rewrite. The user shouldn't have to know:
- ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the
frontmatter when seeding a new project's PREFERENCES.md, recording the
sf version that wrote the template.
- New module preferences-template-upgrade.ts:
- detectTemplateDrift(prefs) — pure check, returns
{ fromVersion, toVersion, needsUpgrade }.
- upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders
the file's frontmatter when fromVersion ≠ toVersion. Body (anything
after the closing `---`) is preserved verbatim, so user notes stay.
- Wired into loadPreferencesFile() — every read self-aligns. No human
warnings, no opt-in flow; sf keeps its own house in order.
- last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so
it round-trips through validatePreferences without "unknown key"
warnings.
Failure modes are non-fatal: missing file, malformed frontmatter, or
read-only filesystem all leave the file alone and return the in-memory
prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of
truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry
points don't stamp incorrect values.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bun's loader doesn't inherit the same library search path as node under
Nix, so require('forge_engine.linux-x64.node') fails with
'libz.so.1: cannot open shared object file' even when the native addon
exists at the expected path. Result: sf-from-source ran in
JS-fallback mode, and we'd been working around it by switching to
node dist/loader.js — which forces a manual `npm run copy-resources`
after every src/ change to keep dist in sync.
This wraps sf-from-source to find a Nix-store zlib at startup and
prepend it to LD_LIBRARY_PATH before exec'ing bun. The native addon
loads cleanly; from-source becomes the reliable default again; no
more dist drift to worry about.
Find pattern: /nix/store/*-zlib-*/lib/libz.so.1 at maxdepth 4
(maxdepth 2 was too shallow — the hash dir is depth 1, lib is depth 2,
the .so.1 file is depth 3, plus we want the parent dir for
LD_LIBRARY_PATH so '%h' on a depth-3 match gives the lib dir).
Outside Nix (no /nix/store), this is a no-op and falls through to
the existing exec.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's
.gitignore on every sf run, causing two issues:
1. Working-tree churn — every invocation dirtied .gitignore, forcing a
commit just to silence "uncommitted changes" warnings. Pattern flagged
by user: "is this the right way with its own every run".
2. False-positive duplicate-add — the literal-string check
(`existingLines.has(".sf")`) didn't recognize user-equivalent patterns
like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit
user entry got duplicated by the auto-add on next run.
Fix: move sf-specific runtime patterns to `.git/info/exclude` via new
`ensureGitInfoExclude()`. That file is per-clone (not committed), so
re-writing is invisible to git status. The project's `.gitignore` stays
human-curated and sf doesn't opinionate on it.
`ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers
don't need to update — backwards compatible. Generic OS/IDE/lang patterns
(.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for
.gitignore since those genuinely belong in version control.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
We sync from two upstreams (pi-mono via cherry-pick, gsd-2 via manual
port) and the gsd-2 syncs hit naming/path translation every time.
This guide makes the translation rules explicit and persistent so
future ports (by humans or by sf) don't have to rediscover them.
Covers:
- The naming translations table: gsd_* → sf_*, .gsd/ → .sf/,
extensions/gsd/ → extensions/sf/, @sf-run/* → @singularity-forge/*,
GSD_HOME → SF_HOME, etc.
- Default rule: translate naming, keep substance. Includes the
cautionary tale of my own self-heal rejection (1bbd20bf7) where I
wrongly skipped a fix because of the path string.
- When a port REALLY doesn't apply (architectural divergence vs naming
divergence) — three categories with examples.
- Mechanics for pi-mono (cherry-pick) vs gsd-2 (manual) ports.
- Skip-list documentation: when you reject, document why in BUILD_PLAN
with the upstream SHA and reason.
- Prompt-edit handling: gsd_<verb> → sf_<verb>, register tools before
porting prompt edits that call them.
Future automation hint at the bottom for a port-translation script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked
.gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use
.sf instead'. That was a superficial read.
The fix is about detecting and recovering from a broken/redirected
staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is
a one-line path translation, not a design difference. The
symlink-resilience logic is exactly what we need for our staging.
Path-translate .gsd/ → .sf/ in the port. The substance ports.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The wrapper imposed CPUQuota=200% / MemoryMax=4G via a transient scope
unit, which requires polkit interactive auth and silently failed on
non-TTY hosts (the script then exit-0'd without running tests). The
limits were a guard against the heavy test:coverage runner's worker
saturation, but test:sf-light already runs in-process with
--max-old-space-size=2048 and --test-timeout=30000 — the systemd
governor was overkill for this lighter target and incompatible with
headless / non-laptop environments.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the framework for evolving the prefs schema without silently breaking
projects pinned to older versions. Each PREFERENCES.md declares `version: N`;
sf declares CURRENT_PREFERENCES_SCHEMA_VERSION in code. On load:
- prefs.version === current → no-op
- prefs.version < current → run registered migrations in chain (forward only,
pure functions). Missing migration in the chain throws — bumping the
schema version requires a matching Migration entry, by construction.
- prefs.version > current → warn "prefs from a newer sf, fields may be
ignored", preserve the value so a later upgrade reads correctly.
- prefs.version undefined → assume v1 (legacy file pre-versioning) and
warn so the user adds an explicit pin.
Migration registry is empty for now (current schema version stays at 1) —
the framework is in place so the first real schema bump is a one-line
addition, not a refactor. Drift detection (`checkPreferencesDrift`) is also
the natural surface for future deprecated-key / missing-required-field
checks when CLAUDE.md / template comparisons are added.
Wired into validatePreferences() so every load path gets the new behavior
automatically — no caller changes needed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pi-mono Tier 0 #4 — manual port (sf went off-task; ported directly).
undici's default 300s bodyTimeout aborts long local-LLM SSE streams
(e.g. vLLM buffering a large tool call) with UND_ERR_BODY_TIMEOUT.
retry.provider.timeoutMs cannot lift this cap — it controls the
provider SDK's AbortController, not undici's per-socket idle timer.
Pass {bodyTimeout: 0, headersTimeout: 0} to EnvHttpProxyAgent. Provider
SDKs continue to enforce their own deadlines.
Type-check passes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pkg/dist/core/export-html/template.js is a tracked dist mirror that
needs the same HTML escape fix as packages/pi-coding-agent/src/core/
export-html/template.js (committed in 701ec8fb8).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this, every fresh project inherits sf's user-level dogfooding
defaults (npm run typecheck:extensions, test:sf-light) — which run sf's
own dev scripts against unrelated repos and produce universal false
negatives. Hit in dr-repo (Go): T01-VERIFY.json showed all_fail because
those npm scripts don't exist there, even though T01's actual work passed
verification per its SUMMARY.
- ensurePreferences() now calls detectProjectSignals() and embeds the
auto-detected commands in the YAML frontmatter on first init. Detection
failure is non-fatal — falls back to the bare template.
- detectVerificationCommands() Go branch now handles multi-module repos
(no root go.mod, only nested ones — common pattern for repos like
dr-repo/{dr-agent,portal,gateway,installer,cmd/installer}). Generates
a per-module loop instead of running go vet/test from the repo root,
which would fail since each subdir is its own Go module.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pi-mono Tier 0 #2 — sf-driven port of PR #3650.
Some LLM providers reject API calls when `tools: []` is sent (an empty
array), but accept the call when the tools field is omitted entirely.
This guards each provider's request-body builder to omit `tools` when
the tool list is empty, instead of serialising the empty array.
Files (5 provider builders):
- packages/pi-ai/src/providers/openai-completions.ts
- packages/pi-ai/src/providers/openai-responses.ts
- packages/pi-ai/src/providers/openai-codex-responses.ts
- packages/pi-ai/src/providers/azure-openai-responses.ts
- packages/pi-ai/src/providers/anthropic-shared.ts (covers anthropic
and anthropic-vertex which both import buildParams from it)
Pattern: `if (context.tools)` → `if (context.tools && context.tools.length > 0)`.
Preserved: the `else if (hasToolHistory(context.messages))` branch in
openai-completions.ts that intentionally emits `tools: []` for
LiteLLM/Anthropic-proxy compatibility is unchanged.
Type-check passes.
Co-Authored-By: sf v2.75.1 (session 38ed0a48)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pi-mono Tier 0 #1 (security) — sf-driven port.
Two upstream security fixes (pi-mono PR #3819, #3883) that escape
user-controlled session content before embedding in HTML exports.
Crafted session content (image mime types, image data, model IDs,
tool names, entry IDs) could otherwise inject markup at the export
boundary.
What sf changed in
packages/pi-coding-agent/src/core/export-html/template.js:
- Image tags: escape `mimeType` and `data` attributes for both
tool-result and user-message image renders (PR #3819).
- Session metadata: escape `msg.toolName`, `msg.role`, `entry.modelId`,
`entry.thinkingLevel`, `entry.type`, `entry.id`, and
`globalStats.models` (PR #3883).
- DOM id construction: renamed `entryId` → `entryDomId` and escape
`entry.id` to prevent attribute-breakout from a crafted id.
The existing `escapeHtml()` helper was used at every site; no new
helper introduced. Type-check passes.
Co-Authored-By: sf v2.75.1 (session 150fe2c1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pi-mono Tier 0 #5 — first sf-driven port. sf-from-source dispatched the
task in print mode and produced this fix autonomously.
Adds getModelMatchCandidates(modelId, modelName?) helper that normalizes
both inputs to lowercase and dash-separated form
(s.replace(/[\s_.:]+/g, "-")). Inference profile ARNs don't embed the
model name; the helper lets capability checks match against either the
inference profile ARN or the underlying model name.
Updated:
- supportsAdaptiveThinking — uses the helper; consolidates the
opus-4.6/opus-4-6 dot-vs-dash variants.
- mapThinkingLevelToEffort — same pattern.
- supportsPromptCaching — same pattern (also from pi-mono PR #3527).
- streamSimpleBedrock and buildAdditionalModelRequestFields — pass
model.name through to capability checks.
Type-check passes (cd packages/pi-ai && npx tsc --noEmit).
Co-Authored-By: sf v2.75.1 (session 911dd2de)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenRouter is already neutered via the provider_model_allow allowlist
(see d38e5ea09 fix(schema): auto-coerce string → [string] for sf_* list
fields + provider_model_allow tests). The 248 model entries in
models.generated.ts are inert — no dispatch path reaches them.
Removing the data entries would be aesthetic cleanup with zero
behavioral effect. Not worth a Tier-1 follow-up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>