Commit graph

3686 commits

Author SHA1 Message Date
Mikael Hugo
0f27ffe865 fix: Let safe smoke tasks use LLM approval 2026-04-30 13:11:26 +02:00
Mikael Hugo
085d3b7705 fix: Show headless source startup progress 2026-04-30 12:19:52 +02:00
Mikael Hugo
6a33357df5 fix: Add production mutation approval gate 2026-04-30 12:17:35 +02:00
Mikael Hugo
08ea92b072 fix: Harden auto recovery and production guards 2026-04-30 11:35:16 +02:00
Mikael Hugo
e60882efc7 Use GLM 4.5 for Zai smoke benchmark 2026-04-30 10:39:17 +02:00
Mikael Hugo
62d430ab23 Add provider smoke benchmark and headless updates 2026-04-30 10:19:18 +02:00
Mikael Hugo
b81138e2ed Replace retired OpenRouter Elephant route 2026-04-30 10:15:34 +02:00
Mikael Hugo
7a09d476c1 Block OpenRouter meta routes from model registry 2026-04-30 10:07:36 +02:00
Mikael Hugo
1dbd30c713 Fix Kimi Code K2.6 routing and pricing 2026-04-30 10:03:06 +02:00
Mikael Hugo
50975c19e0 Automate source resource rebuild for SF 2026-04-30 09:35:59 +02:00
Mikael Hugo
6ccce42c62 Add headless bootstrap and TODO triage tests 2026-04-30 09:21:24 +02:00
Mikael Hugo
e62b3854cb Prevent auto-commit after cancelled units 2026-04-30 09:07:44 +02:00
Mikael Hugo
8487507d1b Add TODO triage and validation recheck flow 2026-04-30 08:41:49 +02:00
Mikael Hugo
ed19fa1864 Complete SF safe ID remediation sweep 2026-04-30 08:08:10 +02:00
Mikael Hugo
f76504a038 Add runaway recovery handoff artifacts 2026-04-30 08:07:44 +02:00
Mikael Hugo
6aa631c17a Apply shared safe ID validation 2026-04-30 07:56:13 +02:00
Mikael Hugo
1a0c458ac4 Harden SF safe path validation 2026-04-30 07:55:07 +02:00
Mikael Hugo
cd69e85608 Harden SF model routing and harness contracts 2026-04-30 07:41:24 +02:00
Mikael Hugo
37c5db3dd3 test: Add verification gate integration tests for failure catching, cle…
- src/resources/extensions/sf/tests/verification-gate.test.ts

SF-Task: S03/T02
2026-04-30 06:40:54 +02:00
Mikael Hugo
a45f873124 chore: snapshot WIP before resuming M004/S03 auto
84 files spanning provider capabilities, model routing, headless
runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted
so headless auto can resume M004 (Production Readiness) S03
(Verification Gate Validation) on a clean tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:31:19 +02:00
Mikael Hugo
3d3a8e26e3 fix(sf): tighten mimo and openrouter model policy 2026-04-29 21:49:49 +02:00
Mikael Hugo
9c4bf9b3e6 fix(sf): use live ollama k2.6 routes 2026-04-29 21:38:51 +02:00
Mikael Hugo
f78c3fb2b8 fix(sf): keep kimi versions exact 2026-04-29 21:17:00 +02:00
Mikael Hugo
ab57548f2b fix: keep skipped tasks out of slice verification 2026-04-29 20:37:56 +02:00
Mikael Hugo
d6fc1211b7 fix: auto-skip stale instruction-conflict tasks 2026-04-29 20:33:06 +02:00
Mikael Hugo
46174c1183 fix: block stale staging task dispatch 2026-04-29 20:25:39 +02:00
Mikael Hugo
120d7deda8 fix: keep headless alive for provider auto-resume 2026-04-29 20:16:23 +02:00
Mikael Hugo
db41f92812 fix: stage declared untracked task files 2026-04-29 20:15:35 +02:00
Mikael Hugo
9398c7000d fix: route bare model families canonically 2026-04-29 20:15:28 +02:00
Mikael Hugo
aa70e1db56 fix: make auto recovery evidence-driven 2026-04-29 19:45:43 +02:00
Mikael Hugo
2ed1638153 fix: add headless heartbeat output 2026-04-29 19:29:43 +02:00
Mikael Hugo
93c1bbcb9a docs: plan judge calibration service 2026-04-29 18:28:45 +02:00
Mikael Hugo
0d6eca9cdd fix: preserve subagent debate mode details 2026-04-29 17:50:26 +02:00
Mikael Hugo
b32fe7acd1 docs: clarify SF harness rollout boundaries 2026-04-29 17:47:51 +02:00
Mikael Hugo
d78c5ac198 feat: add SF skills and subagent debate mode 2026-04-29 17:44:30 +02:00
Mikael Hugo
d02d33aa70 feat: add repo harness profiler 2026-04-29 17:39:52 +02:00
Mikael Hugo
a611db9032 docs: specify repo-native harness evolution 2026-04-29 17:23:39 +02:00
Mikael Hugo
ffa216d6ad docs: log caveman input-compression follow-ups in BUILD_PLAN
Caveman skill (output compression) installed at ~/.claude/skills/caveman/
and activated for dr-repo. Two follow-ups for INPUT-side compression
remain — sf's own prompts are verbose (execute-task alone has 10-step
instructions, runtime context, multiple inlined plans), and that's paid
on every dispatch:

- Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in
  caveman style. Preserve intent + nuance, drop fluff. Compare against
  current to confirm no quality regression.
- Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt
  through caveman-compress (sub-skill, ~46% reduction) before dispatch.
  Behind a terse_prompts: true flag. Adds drift risk vs authored intent;
  needs comparison harness.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:46:32 +02:00
Mikael Hugo
fb4885b757 prompt(execute-task): add parallel-tool-call rule
Adds step 0a: when independent reads/greps are needed, batch them in a
single assistant turn instead of one-at-a-time. The existing step 0
already pushed for terse narration, but didn't address the bigger waste
— sequential tool calls when parallel would work. Common case: reading
handler + test + schema to triangulate a bug — three reads in one turn,
not three turns.

Also nudges away from "talking-then-doing": if the next action is
unambiguous, just take it. Describing intent before every call is the
dead weight that adds up to 30-50% extra round-trips.

Behavior fix only (prompt-level). Model can still narrate inside its
thinking channel since that's a model property; this targets the
chat/tool-use channel where the user pays per turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:42:22 +02:00
Mikael Hugo
c5df4b46a6 fix(headless): await auto loop in headless mode 2026-04-29 15:37:17 +02:00
Mikael Hugo
df614a3e47 fix(headless): split idle-timeout role from deadlock-backstop role
The single IDLE_TIMEOUT_MS constant was conflating two different jobs:
"are we done?" vs "is the agent stuck?". For multi-turn commands (auto,
next, discuss, plan), the first question is wrong — those signal
completion explicitly via "auto-mode stopped" terminal notifications,
and child-process exit catches crashes. The 120s I'd just bumped
multi-turn to was still in idle-detection mindset; that's not what we
need from this timer.

New semantics:
- IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle
  really does mean done.
- NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with
  pauses for thinking between bootstrap steps.
- MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan.
  Not a "done" detector; a deadlock recovery bound. Long enough to
  never bother slow LLM reasoning or chained tool calls; short enough
  to recover from a true hang within a reasonable window. Real
  completion comes from terminal notifications + child-process exit,
  both already wired.

Code reads cleaner too: effectiveIdleTimeout selection now mirrors the
three-way conceptual split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:18:58 +02:00
Mikael Hugo
c239ad6c9d fix(headless): use long idle timeout for auto/next/discuss/plan
The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf
headless auto would dispatch a task, the LLM would make 1-2 tool calls,
pause to reason about the next step, exceed 15s of "no events", and
headless would declare "Status: complete" — exiting at ~35s with the task
barely started (123 events but only 2 tool calls).

The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason
("LLM may pause between tool calls e.g. after mkdir, before writing
files"). The same applies to auto/next/discuss/plan — all multi-turn
commands where the LLM thinks longer between actions, especially on
non-trivial tasks. isMultiTurnCommand was already defined for related
logic; this just wires it into the idle-timeout decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:13:43 +02:00
Mikael Hugo
6e342a8875 fix(sf-from-source): switch from bun to node — clean from-source path
bun was the wrong runtime for our environment, two ways:

1. bun doesn't ship node:sqlite. sf-db.ts falls back through node:sqlite
   → better-sqlite3 → null. Result: 'No SQLite provider available' and
   degraded-mode filesystem-state derivation, even though sqlite is
   actually available (node:sqlite under node, bun:sqlite under bun —
   both valid, but our code only knows the node names).

2. bun's loader doesn't inherit the system library search path under
   Nix. libz.so.1 isn't found for forge_engine.node, so the native
   addon falls through to JS implementations (slower).

Both warnings ("Native addon not available", "DB unavailable —
degraded mode") were the symptom of "we're running under bun".

Fix: use node + the existing src/resources/extensions/sf/tests/
resolve-ts.mjs loader hook (which already handles .js → .ts
import-specifier remapping for runtime resolution) +
--experimental-strip-types (node 22+, native in 24).

Result: from-source via node loads cleanly. No native warning.
No sqlite warning. No degraded mode. Exec: `./bin/sf-from-source
--print "..."` returns the model output and nothing else.

Drops the LD_LIBRARY_PATH zlib-injection hack that was added in
4912f6ea8 — that was working around the bun native-loader issue
that doesn't exist under node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:07:24 +02:00
Mikael Hugo
2afe2ac6f1 feat(prefs): self-aligning template upgrades — sf keeps its own files synced
Companion to the earlier schema-versioning framework. Where that handles
data-shape evolution via forward migrations, this handles file-template
evolution via silent self-rewrite. The user shouldn't have to know:

- ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the
  frontmatter when seeding a new project's PREFERENCES.md, recording the
  sf version that wrote the template.
- New module preferences-template-upgrade.ts:
  - detectTemplateDrift(prefs) — pure check, returns
    { fromVersion, toVersion, needsUpgrade }.
  - upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders
    the file's frontmatter when fromVersion ≠ toVersion. Body (anything
    after the closing `---`) is preserved verbatim, so user notes stay.
- Wired into loadPreferencesFile() — every read self-aligns. No human
  warnings, no opt-in flow; sf keeps its own house in order.
- last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so
  it round-trips through validatePreferences without "unknown key"
  warnings.

Failure modes are non-fatal: missing file, malformed frontmatter, or
read-only filesystem all leave the file alone and return the in-memory
prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of
truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry
points don't stamp incorrect values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:05:37 +02:00
Mikael Hugo
4912f6ea80 fix(sf-from-source): inject Nix-store zlib into LD_LIBRARY_PATH
bun's loader doesn't inherit the same library search path as node under
Nix, so require('forge_engine.linux-x64.node') fails with
'libz.so.1: cannot open shared object file' even when the native addon
exists at the expected path. Result: sf-from-source ran in
JS-fallback mode, and we'd been working around it by switching to
node dist/loader.js — which forces a manual `npm run copy-resources`
after every src/ change to keep dist in sync.

This wraps sf-from-source to find a Nix-store zlib at startup and
prepend it to LD_LIBRARY_PATH before exec'ing bun. The native addon
loads cleanly; from-source becomes the reliable default again; no
more dist drift to worry about.

Find pattern: /nix/store/*-zlib-*/lib/libz.so.1 at maxdepth 4
(maxdepth 2 was too shallow — the hash dir is depth 1, lib is depth 2,
the .so.1 file is depth 3, plus we want the parent dir for
LD_LIBRARY_PATH so '%h' on a depth-3 match gives the lib dir).

Outside Nix (no /nix/store), this is a no-op and falls through to
the existing exec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:01:55 +02:00
Mikael Hugo
a2b709f669 fix(gitignore): write sf runtime patterns to .git/info/exclude, not .gitignore
ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's
.gitignore on every sf run, causing two issues:

1. Working-tree churn — every invocation dirtied .gitignore, forcing a
   commit just to silence "uncommitted changes" warnings. Pattern flagged
   by user: "is this the right way with its own every run".

2. False-positive duplicate-add — the literal-string check
   (`existingLines.has(".sf")`) didn't recognize user-equivalent patterns
   like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit
   user entry got duplicated by the auto-add on next run.

Fix: move sf-specific runtime patterns to `.git/info/exclude` via new
`ensureGitInfoExclude()`. That file is per-clone (not committed), so
re-writing is invisible to git status. The project's `.gitignore` stays
human-curated and sf doesn't opinionate on it.

`ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers
don't need to update — backwards compatible. Generic OS/IDE/lang patterns
(.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for
.gitignore since those genuinely belong in version control.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:58:14 +02:00
Mikael Hugo
6031106d93 docs: add UPSTREAM_PORT_GUIDE.md — translation rules for gsd-2 → sf ports
We sync from two upstreams (pi-mono via cherry-pick, gsd-2 via manual
port) and the gsd-2 syncs hit naming/path translation every time.
This guide makes the translation rules explicit and persistent so
future ports (by humans or by sf) don't have to rediscover them.

Covers:
- The naming translations table: gsd_* → sf_*, .gsd/ → .sf/,
  extensions/gsd/ → extensions/sf/, @sf-run/* → @singularity-forge/*,
  GSD_HOME → SF_HOME, etc.
- Default rule: translate naming, keep substance. Includes the
  cautionary tale of my own self-heal rejection (1bbd20bf7) where I
  wrongly skipped a fix because of the path string.
- When a port REALLY doesn't apply (architectural divergence vs naming
  divergence) — three categories with examples.
- Mechanics for pi-mono (cherry-pick) vs gsd-2 (manual) ports.
- Skip-list documentation: when you reject, document why in BUILD_PLAN
  with the upstream SHA and reason.
- Prompt-edit handling: gsd_<verb> → sf_<verb>, register tools before
  porting prompt edits that call them.

Future automation hint at the bottom for a port-translation script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:51:19 +02:00
Mikael Hugo
1bbd20bf78 docs: correct gsd-2 self-heal port — substance applies after path translation
Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked
.gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use
.sf instead'. That was a superficial read.

The fix is about detecting and recovering from a broken/redirected
staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is
a one-line path translation, not a design difference. The
symlink-resilience logic is exactly what we need for our staging.

Path-translate .gsd/ → .sf/ in the port. The substance ports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:49:05 +02:00
Mikael Hugo
9a7d6b7d98 chore(test): drop systemd-run wrapper from test:sf-light
The wrapper imposed CPUQuota=200% / MemoryMax=4G via a transient scope
unit, which requires polkit interactive auth and silently failed on
non-TTY hosts (the script then exit-0'd without running tests). The
limits were a guard against the heavy test:coverage runner's worker
saturation, but test:sf-light already runs in-process with
--max-old-space-size=2048 and --test-timeout=30000 — the systemd
governor was overkill for this lighter target and incompatible with
headless / non-laptop environments.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:47:50 +02:00
Mikael Hugo
9b718f8e36 fix(headless): repair missing sf project symlink 2026-04-29 14:43:30 +02:00