Commit graph

3646 commits

Author SHA1 Message Date
Mikael Hugo
df614a3e47 fix(headless): split idle-timeout role from deadlock-backstop role
The single IDLE_TIMEOUT_MS constant was conflating two different jobs:
"are we done?" vs "is the agent stuck?". For multi-turn commands (auto,
next, discuss, plan), the first question is wrong — those signal
completion explicitly via "auto-mode stopped" terminal notifications,
and child-process exit catches crashes. The 120s I'd just bumped
multi-turn to was still in idle-detection mindset; that's not what we
need from this timer.

New semantics:
- IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle
  really does mean done.
- NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with
  pauses for thinking between bootstrap steps.
- MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan.
  Not a "done" detector; a deadlock recovery bound. Long enough to
  never bother slow LLM reasoning or chained tool calls; short enough
  to recover from a true hang within a reasonable window. Real
  completion comes from terminal notifications + child-process exit,
  both already wired.

Code reads cleaner too: effectiveIdleTimeout selection now mirrors the
three-way conceptual split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:18:58 +02:00
Mikael Hugo
c239ad6c9d fix(headless): use long idle timeout for auto/next/discuss/plan
The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf
headless auto would dispatch a task, the LLM would make 1-2 tool calls,
pause to reason about the next step, exceed 15s of "no events", and
headless would declare "Status: complete" — exiting at ~35s with the task
barely started (123 events but only 2 tool calls).

The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason
("LLM may pause between tool calls e.g. after mkdir, before writing
files"). The same applies to auto/next/discuss/plan — all multi-turn
commands where the LLM thinks longer between actions, especially on
non-trivial tasks. isMultiTurnCommand was already defined for related
logic; this just wires it into the idle-timeout decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:13:43 +02:00
Mikael Hugo
6e342a8875 fix(sf-from-source): switch from bun to node — clean from-source path
bun was the wrong runtime for our environment, two ways:

1. bun doesn't ship node:sqlite. sf-db.ts falls back through node:sqlite
   → better-sqlite3 → null. Result: 'No SQLite provider available' and
   degraded-mode filesystem-state derivation, even though sqlite is
   actually available (node:sqlite under node, bun:sqlite under bun —
   both valid, but our code only knows the node names).

2. bun's loader doesn't inherit the system library search path under
   Nix. libz.so.1 isn't found for forge_engine.node, so the native
   addon falls through to JS implementations (slower).

Both warnings ("Native addon not available", "DB unavailable —
degraded mode") were the symptom of "we're running under bun".

Fix: use node + the existing src/resources/extensions/sf/tests/
resolve-ts.mjs loader hook (which already handles .js → .ts
import-specifier remapping for runtime resolution) +
--experimental-strip-types (node 22+, native in 24).

Result: from-source via node loads cleanly. No native warning.
No sqlite warning. No degraded mode. Exec: `./bin/sf-from-source
--print "..."` returns the model output and nothing else.

Drops the LD_LIBRARY_PATH zlib-injection hack that was added in
4912f6ea8 — that was working around the bun native-loader issue
that doesn't exist under node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:07:24 +02:00
Mikael Hugo
2afe2ac6f1 feat(prefs): self-aligning template upgrades — sf keeps its own files synced
Companion to the earlier schema-versioning framework. Where that handles
data-shape evolution via forward migrations, this handles file-template
evolution via silent self-rewrite. The user shouldn't have to know:

- ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the
  frontmatter when seeding a new project's PREFERENCES.md, recording the
  sf version that wrote the template.
- New module preferences-template-upgrade.ts:
  - detectTemplateDrift(prefs) — pure check, returns
    { fromVersion, toVersion, needsUpgrade }.
  - upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders
    the file's frontmatter when fromVersion ≠ toVersion. Body (anything
    after the closing `---`) is preserved verbatim, so user notes stay.
- Wired into loadPreferencesFile() — every read self-aligns. No human
  warnings, no opt-in flow; sf keeps its own house in order.
- last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so
  it round-trips through validatePreferences without "unknown key"
  warnings.

Failure modes are non-fatal: missing file, malformed frontmatter, or
read-only filesystem all leave the file alone and return the in-memory
prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of
truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry
points don't stamp incorrect values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:05:37 +02:00
Mikael Hugo
4912f6ea80 fix(sf-from-source): inject Nix-store zlib into LD_LIBRARY_PATH
bun's loader doesn't inherit the same library search path as node under
Nix, so require('forge_engine.linux-x64.node') fails with
'libz.so.1: cannot open shared object file' even when the native addon
exists at the expected path. Result: sf-from-source ran in
JS-fallback mode, and we'd been working around it by switching to
node dist/loader.js — which forces a manual `npm run copy-resources`
after every src/ change to keep dist in sync.

This wraps sf-from-source to find a Nix-store zlib at startup and
prepend it to LD_LIBRARY_PATH before exec'ing bun. The native addon
loads cleanly; from-source becomes the reliable default again; no
more dist drift to worry about.

Find pattern: /nix/store/*-zlib-*/lib/libz.so.1 at maxdepth 4
(maxdepth 2 was too shallow — the hash dir is depth 1, lib is depth 2,
the .so.1 file is depth 3, plus we want the parent dir for
LD_LIBRARY_PATH so '%h' on a depth-3 match gives the lib dir).

Outside Nix (no /nix/store), this is a no-op and falls through to
the existing exec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:01:55 +02:00
Mikael Hugo
a2b709f669 fix(gitignore): write sf runtime patterns to .git/info/exclude, not .gitignore
ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's
.gitignore on every sf run, causing two issues:

1. Working-tree churn — every invocation dirtied .gitignore, forcing a
   commit just to silence "uncommitted changes" warnings. Pattern flagged
   by user: "is this the right way with its own every run".

2. False-positive duplicate-add — the literal-string check
   (`existingLines.has(".sf")`) didn't recognize user-equivalent patterns
   like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit
   user entry got duplicated by the auto-add on next run.

Fix: move sf-specific runtime patterns to `.git/info/exclude` via new
`ensureGitInfoExclude()`. That file is per-clone (not committed), so
re-writing is invisible to git status. The project's `.gitignore` stays
human-curated and sf doesn't opinionate on it.

`ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers
don't need to update — backwards compatible. Generic OS/IDE/lang patterns
(.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for
.gitignore since those genuinely belong in version control.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:58:14 +02:00
Mikael Hugo
6031106d93 docs: add UPSTREAM_PORT_GUIDE.md — translation rules for gsd-2 → sf ports
We sync from two upstreams (pi-mono via cherry-pick, gsd-2 via manual
port) and the gsd-2 syncs hit naming/path translation every time.
This guide makes the translation rules explicit and persistent so
future ports (by humans or by sf) don't have to rediscover them.

Covers:
- The naming translations table: gsd_* → sf_*, .gsd/ → .sf/,
  extensions/gsd/ → extensions/sf/, @sf-run/* → @singularity-forge/*,
  GSD_HOME → SF_HOME, etc.
- Default rule: translate naming, keep substance. Includes the
  cautionary tale of my own self-heal rejection (1bbd20bf7) where I
  wrongly skipped a fix because of the path string.
- When a port REALLY doesn't apply (architectural divergence vs naming
  divergence) — three categories with examples.
- Mechanics for pi-mono (cherry-pick) vs gsd-2 (manual) ports.
- Skip-list documentation: when you reject, document why in BUILD_PLAN
  with the upstream SHA and reason.
- Prompt-edit handling: gsd_<verb> → sf_<verb>, register tools before
  porting prompt edits that call them.

Future automation hint at the bottom for a port-translation script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:51:19 +02:00
Mikael Hugo
1bbd20bf78 docs: correct gsd-2 self-heal port — substance applies after path translation
Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked
.gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use
.sf instead'. That was a superficial read.

The fix is about detecting and recovering from a broken/redirected
staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is
a one-line path translation, not a design difference. The
symlink-resilience logic is exactly what we need for our staging.

Path-translate .gsd/ → .sf/ in the port. The substance ports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:49:05 +02:00
Mikael Hugo
9a7d6b7d98 chore(test): drop systemd-run wrapper from test:sf-light
The wrapper imposed CPUQuota=200% / MemoryMax=4G via a transient scope
unit, which requires polkit interactive auth and silently failed on
non-TTY hosts (the script then exit-0'd without running tests). The
limits were a guard against the heavy test:coverage runner's worker
saturation, but test:sf-light already runs in-process with
--max-old-space-size=2048 and --test-timeout=30000 — the systemd
governor was overkill for this lighter target and incompatible with
headless / non-laptop environments.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:47:50 +02:00
Mikael Hugo
9b718f8e36 fix(headless): repair missing sf project symlink 2026-04-29 14:43:30 +02:00
Mikael Hugo
3b6cbcd79f feat(prefs): schema versioning with forward-migration registry
Adds the framework for evolving the prefs schema without silently breaking
projects pinned to older versions. Each PREFERENCES.md declares `version: N`;
sf declares CURRENT_PREFERENCES_SCHEMA_VERSION in code. On load:

- prefs.version === current → no-op
- prefs.version < current → run registered migrations in chain (forward only,
  pure functions). Missing migration in the chain throws — bumping the
  schema version requires a matching Migration entry, by construction.
- prefs.version > current → warn "prefs from a newer sf, fields may be
  ignored", preserve the value so a later upgrade reads correctly.
- prefs.version undefined → assume v1 (legacy file pre-versioning) and
  warn so the user adds an explicit pin.

Migration registry is empty for now (current schema version stays at 1) —
the framework is in place so the first real schema bump is a one-line
addition, not a refactor. Drift detection (`checkPreferencesDrift`) is also
the natural surface for future deprecated-key / missing-required-field
checks when CLAUDE.md / template comparisons are added.

Wired into validatePreferences() so every load path gets the new behavior
automatically — no caller changes needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:38:43 +02:00
Mikael Hugo
dea4c2dbc1 docs: update Tier 0 with port status; flag SSE parser refactor as bigger work
5 of 9 Tier 0 items landed:
- #1 HTML export escape (security)            701ec8fb8 + 92c6d933c
- #2 Empty tools array fix                    58b1d7c60
- #4 undici 5min timeout                      d0907b6d8
- #5 Bedrock inference profile                7c487bb60

Deferred:
- #3 Anthropic SSE proxy event tolerance — fix applies to pi-mono's
  custom SSE parser, but we still use @anthropic-ai/sdk directly.
  To get protection we'd need to port the full "own Anthropic SSE
  parsing" refactor (3 commits, ~200 LOC). Added as a separate Tier 0
  item.

Remaining TODO from Tier 0: items #6-#9 (symlinked dedup, setWorkingVisible
extension API, Cloudflare provider, Azure Cognitive Services).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:35:55 +02:00
Mikael Hugo
d0907b6d87 port(pi-mono): disable undici body/headers idle timeouts on global dispatcher (refs ea90a6783)
Pi-mono Tier 0 #4 — manual port (sf went off-task; ported directly).

undici's default 300s bodyTimeout aborts long local-LLM SSE streams
(e.g. vLLM buffering a large tool call) with UND_ERR_BODY_TIMEOUT.
retry.provider.timeoutMs cannot lift this cap — it controls the
provider SDK's AbortController, not undici's per-socket idle timer.

Pass {bodyTimeout: 0, headersTimeout: 0} to EnvHttpProxyAgent. Provider
SDKs continue to enforce their own deadlines.

Type-check passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:35:08 +02:00
Mikael Hugo
92c6d933ce chore(pkg/dist): sync template.js with source after HTML escape port (refs 701ec8fb8)
pkg/dist/core/export-html/template.js is a tracked dist mirror that
needs the same HTML escape fix as packages/pi-coding-agent/src/core/
export-html/template.js (committed in 701ec8fb8).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:28:33 +02:00
Mikael Hugo
6248e79a7a feat(init): auto-seed PREFERENCES.md with detected verification_commands
Without this, every fresh project inherits sf's user-level dogfooding
defaults (npm run typecheck:extensions, test:sf-light) — which run sf's
own dev scripts against unrelated repos and produce universal false
negatives. Hit in dr-repo (Go): T01-VERIFY.json showed all_fail because
those npm scripts don't exist there, even though T01's actual work passed
verification per its SUMMARY.

- ensurePreferences() now calls detectProjectSignals() and embeds the
  auto-detected commands in the YAML frontmatter on first init. Detection
  failure is non-fatal — falls back to the bare template.
- detectVerificationCommands() Go branch now handles multi-module repos
  (no root go.mod, only nested ones — common pattern for repos like
  dr-repo/{dr-agent,portal,gateway,installer,cmd/installer}). Generates
  a per-module loop instead of running go vet/test from the repo root,
  which would fail since each subdir is its own Go module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:26:49 +02:00
Mikael Hugo
58b1d7c601 port(pi-mono): omit tools field instead of sending empty array (refs 3e0ee69b5)
Pi-mono Tier 0 #2 — sf-driven port of PR #3650.

Some LLM providers reject API calls when `tools: []` is sent (an empty
array), but accept the call when the tools field is omitted entirely.
This guards each provider's request-body builder to omit `tools` when
the tool list is empty, instead of serialising the empty array.

Files (5 provider builders):
- packages/pi-ai/src/providers/openai-completions.ts
- packages/pi-ai/src/providers/openai-responses.ts
- packages/pi-ai/src/providers/openai-codex-responses.ts
- packages/pi-ai/src/providers/azure-openai-responses.ts
- packages/pi-ai/src/providers/anthropic-shared.ts (covers anthropic
  and anthropic-vertex which both import buildParams from it)

Pattern: `if (context.tools)` → `if (context.tools && context.tools.length > 0)`.

Preserved: the `else if (hasToolHistory(context.messages))` branch in
openai-completions.ts that intentionally emits `tools: []` for
LiteLLM/Anthropic-proxy compatibility is unchanged.

Type-check passes.

Co-Authored-By: sf v2.75.1 (session 38ed0a48)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:22:31 +02:00
Mikael Hugo
701ec8fb88 port(pi-mono): escape session metadata + image data in HTML export (refs 7617c1ad9, 57787b655)
Pi-mono Tier 0 #1 (security) — sf-driven port.

Two upstream security fixes (pi-mono PR #3819, #3883) that escape
user-controlled session content before embedding in HTML exports.
Crafted session content (image mime types, image data, model IDs,
tool names, entry IDs) could otherwise inject markup at the export
boundary.

What sf changed in
packages/pi-coding-agent/src/core/export-html/template.js:

- Image tags: escape `mimeType` and `data` attributes for both
  tool-result and user-message image renders (PR #3819).
- Session metadata: escape `msg.toolName`, `msg.role`, `entry.modelId`,
  `entry.thinkingLevel`, `entry.type`, `entry.id`, and
  `globalStats.models` (PR #3883).
- DOM id construction: renamed `entryId` → `entryDomId` and escape
  `entry.id` to prevent attribute-breakout from a crafted id.

The existing `escapeHtml()` helper was used at every site; no new
helper introduced. Type-check passes.

Co-Authored-By: sf v2.75.1 (session 150fe2c1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:20:23 +02:00
Mikael Hugo
7c487bb60e port(pi-mono): normalize Bedrock model names for inference profiles (refs ed4bc7308)
Pi-mono Tier 0 #5 — first sf-driven port. sf-from-source dispatched the
task in print mode and produced this fix autonomously.

Adds getModelMatchCandidates(modelId, modelName?) helper that normalizes
both inputs to lowercase and dash-separated form
(s.replace(/[\s_.:]+/g, "-")). Inference profile ARNs don't embed the
model name; the helper lets capability checks match against either the
inference profile ARN or the underlying model name.

Updated:
- supportsAdaptiveThinking — uses the helper; consolidates the
  opus-4.6/opus-4-6 dot-vs-dash variants.
- mapThinkingLevelToEffort — same pattern.
- supportsPromptCaching — same pattern (also from pi-mono PR #3527).
- streamSimpleBedrock and buildAdditionalModelRequestFields — pass
  model.name through to capability checks.

Type-check passes (cd packages/pi-ai && npx tsc --noEmit).

Co-Authored-By: sf v2.75.1 (session 911dd2de)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:14:17 +02:00
Mikael Hugo
a3c487c918 docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first
Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence):
9 items ranked security → bug-fixes → infra → features.

  Critical:
    1. HTML export escape (security)
    2. Empty tools array fix (provider compatibility)
    3. Anthropic SSE proxy event tolerance
    4. Long local-LLM SSE 5min timeout fix

  Infrastructure:
    5. Bedrock inference profile normalization
    6. Symlinked packages dedup
    7. ctx.ui.setWorkingVisible() extension API

  Features:
    8. Cloudflare Workers AI provider
    9. Azure Cognitive Services endpoint

Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace):

  Critical fixes (11):
    1-6.  bash race, security hardening, web_search injection narrowing,
          symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock
    7-10. agent_end transition fixes (4 commits)
    11.   claude-code-cli Always-Allow persistence

  Normal-value features (6):
    12. /gsd eval-review slim port (prompt + tool + template)
    13. Workflow state machine hardening (5 commits as unit)
    14. Proactive rate limiting (min_request_interval_ms)
    15. Per-call token telemetry (opt-in pi-coding-agent hooks)
    16. Worktree TUI commands
    17. Doctor check for orphan milestone directories

Skipped from each upstream is documented. All in BUILD_PLAN.md so sf
can work the list systematically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:04:31 +02:00
Mikael Hugo
bb1c68b7ab docs: drop OpenRouter-removal follow-up
OpenRouter is already neutered via the provider_model_allow allowlist
(see d38e5ea09 fix(schema): auto-coerce string → [string] for sf_* list
fields + provider_model_allow tests). The 248 model entries in
models.generated.ts are inert — no dispatch path reaches them.

Removing the data entries would be aesthetic cleanup with zero
behavioral effect. Not worth a Tier-1 follow-up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:58:33 +02:00
Mikael Hugo
310ce963ea docs: add session follow-ups to BUILD_PLAN
Six items surfaced during 2026-04-29 ports/refactors that didn't get
tracked anywhere:

- Tier 1: Remove OpenRouter (~248 model entries; user confirmed unused)
- Tier 1: Minimax search tests (deferred from initial port)
- Tier 2: Search provider registry refactor (rid of 9-file-per-provider)
- Tier 2: Product-audit phase machine wire-up (slim port shipped tool;
  phase dispatch not yet wired)
- Tier 2: Headless assistant-text preview (bunker pattern, deferred from
  headless UX commit)
- Tier 3: Pi-mono SDK sync cadence

Each entry has rationale + effort estimate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:56:55 +02:00
Mikael Hugo
a8cf2cd941 feat(workflow): add product-audit (slim port)
Milestone-end workflow that compares declared product intent (VISION.md,
RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and
emits structured gaps with severity. Soft gates — adds follow-up slices
but doesn't hard-block merge.

Slim port (4 new files + 1 registration) — extracts only the audit
feature itself, not bunker's parallel rewrite of dispatch/prompts/
benchmark-selector that came with it in commit 2aa785475.

Created:
- prompts/product-audit.md         — prompt verbatim, gsd_*→sf_* and .gsd→.sf
- tools/product-audit-tool.ts      — slim file-write implementation,
                                     atomicWriteAsync to .sf/active/{mid}/
                                     PRODUCT-AUDIT.{json,md}; no DB deps
- bootstrap/product-audit-tool.ts  — pi-coding-agent tool registration,
                                     TypeBox schema for sf_product_audit
- workflow-templates/product-audit.md — workflow template

Modified:
- bootstrap/register-extension.ts  — 2 lines: import + add to nonCriticalRegistrations
- workflow-templates/registry.json — registry entry
- package.json — version 2.75.0 → 2.75.1

Verdict logic (no-gaps | gaps-found | contract-underspecified) is the
load-bearing innovation: contract-underspecified forces the auditor to
flag unverifiable docs as a real gap rather than rubber-stamping
no-gaps when the product contract is silent.

Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase
machine is a follow-up; the prompt + tool + template stand alone.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:23 +02:00
Mikael Hugo
2eebeccb93 feat(search): add MiniMax web search provider
New search backend alongside tavily/brave/serper/exa/ollama. API key
resolution: MINIMAX_CODE_PLAN_KEY → MINIMAX_CODING_API_KEY →
MINIMAX_API_KEY (fallback order matches MiniMax's documented aliases).

Wired through every existing seam:
- type union: SearchProvider = 'tavily' | 'minimax' | 'brave' | 'ollama'
- VALID_PREFERENCES set + selection logic in provider.ts
- native-search routing (Anthropic native web_search delegates correctly)
- /search-provider CLI command (tab completion, select UI, parser)
- tool-search.ts: search execution path
- tool-llm-context.ts: prefetch / context-builder path
- preferences-types + preferences-validation
- configuration.md user docs
- extension-manifest description

Tests not added in this commit — the bunker reference tests don't match
our preferences/provider export shape (we have serper/exa/combosearch
that bunker doesn't). Tests for getMiniMaxSearchApiKey priority order,
resolveSearchProvider returning "minimax", /search-provider minimax CLI
behavior, no-key error messages, and executeMiniMaxSearch request shape
are TODO.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:04 +02:00
Mikael Hugo
ae0bbe32fc feat(providers): add xiaomi direct API (token-plan-{ams,sgp,cn}) — additive
Adds direct xiaomi token-plan API access alongside the existing
OpenRouter-routed xiaomi entries. ADDITIVE only — OpenRouter cleanup is
a separate follow-up.

Three new region providers:
- xiaomi-token-plan-ams (Amsterdam, default for plain `xiaomi`)
- xiaomi-token-plan-sgp (Singapore)
- xiaomi-token-plan-cn (China)

All use Anthropic Messages API. Env-var resolution: XIAOMI_API_KEY →
XIAOMI_TOKEN_PLAN_API_KEY → MIMO_API_KEY (in that fallback order).

Three xiaomi MiMo models registered under each direct provider:
- mimo-v2-flash (256k ctx, 64k output, text-only, reasoning)
- mimo-v2-omni (256k ctx, 128k output, text+image, reasoning)
- mimo-v2-pro (1M ctx, 128k output, text-only, reasoning)

Same model literals × 4 provider keys, different baseUrls per region.
Test count assertion bumped 22 → 26 providers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:54:43 +02:00
Mikael Hugo
dff0df5fdc fix(headless): suppress notification spam, categorize messages, distinguish phase vs status
Three small UX fixes for headless / autopilot logs:

1. Add `zz-notifications` to TUI_FOOTER_STATUS_KEYS — these are sticky
   notification dots from the interactive TUI footer; they have no
   meaning in headless and were spamming the log.

2. Categorize notification messages by prefix so headless output is
   scannable: [mcp] for MCP-client-ready, [search] for web search status,
   [parallel] for slice-parallel/subagent dispatch. Falls through to
   the existing important/non-important formatting for everything else.

3. Distinguish phase transitions from generic status updates: phase:/
   milestone:/slice:/task: prefixed keys get [phase]; everything else
   gets [status]. Previously both used [phase], which was misleading.

Patterns based on bunker commits 14ec4d97f / c15afb45f (which were the
research source) but written fresh against our existing
TUI_FOOTER_STATUS_KEYS structure rather than cherry-picked.

The assistant-text-preview commit (cf0274c63) is a separate, larger
refactor in headless.ts and is deferred to v3.1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:43:40 +02:00
Mikael Hugo
c41912ff55 fix(prompts): tell agents about Serena (repo-intelligence MCP) for code exploration
We have .serena/ configured (cache, memories, project.local.yml) but no
prompt mentioned Serena anywhere. Agents weren't using it for symbol
lookup or cross-file architecture mapping; they fell straight to rg/find.

Added a one-sentence Serena hint to the code-exploration step in:
- research-slice.md
- research-milestone.md
- plan-slice.md
- plan-milestone.md
- guided-research-slice.md

Phrased generically ("If a repo-intelligence MCP (e.g. Serena) is
configured...") so it degrades cleanly when Serena isn't set up.

Pattern based on bunker commit 4ba746888 but written fresh against our
post-rename prompt structure rather than cherry-picked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:41:33 +02:00
Mikael Hugo
7a6169705a docs: lock in fork stance, reframe cherry-pick list as reference-only
After attempting cluster B (4 surgical agent-session fixes), even the
first commit conflicted because of structural namespace divergence
(gsd_*→sf_* rename, @sf-run/*→@singularity-forge/* rename, prior
pi-mono direct cherry-picks). The conflicts are real semantic
divergence, not noise.

Conclusion: sf is a fork; we do not periodically sync from
gsd-build/gsd-2. Pretending we still track upstream means weeks of
merge work for diminishing return.

BUILD_PLAN.md adds an explicit "Upstream stance" section documenting
the fork posture and the rationale for the three irreversible naming
choices.

UPSTREAM_CHERRY_PICK_CANDIDATES.md is reframed as a reference list,
not an action plan. The clusters and SHAs remain useful as an
intelligence source — port specific fixes by hand when one bites us;
do not run automated cherry-picks against the list.

Pi-mono SDK syncs continue separately — that path doesn't have the
same divergence problem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:57:44 +02:00
Mikael Hugo
a80beb83b5 docs: enumerate high-value upstream cherry-pick candidates
The origin↔upstream divergence is 4,589 commits. This file picks the
high-leverage subset (~70 commits across 16 topical clusters) worth
considering for cherry-pick. Recommended order at the bottom.

Each cluster lists candidate SHAs with one-line context and effort
estimates. Total estimated work if all clusters A-N are taken: ~10-15
hours plus conflict resolution. Cluster O (UnitContextManifest /
Composer rewrite, ~15 commits) is deferred — likely conflicts heavily
with our work and should be revisited during v3 schema reconciliation.

Cluster P (memories table cutover, 1 commit) is flagged as READ FIRST
because it's upstream's answer to what BUILD_PLAN calls Singularity
Memory integration; reading it may change the recommended integration
path.

This is a candidate list for human decision, not an action plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:53:46 +02:00
Mikael Hugo
b24f426f2b batch: snapshot of in-flight v2 work
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.

Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
  trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries

The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.

The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:42:31 +02:00
Mikael Hugo
31842885ea docs: add BUILD_PLAN.md — tiered cut of v3 NEW items
Of the 56 NEW items in SPEC.md, not all are worth building for v3.
This plan groups them by tier:

- Tier 1 ESSENTIAL (~5 weeks): Vault resolver, sm integration decision,
  schema reconciliation, config alignment.
- Tier 2 STRONG (~3-4 weeks): doc-sync, intent chapters, PhaseReview
  3-pass, turn_status marker, last_error cap, cost_micro_usd.
- Tier 3 NICE (v3.1+): persistent agents, inter-agent messaging,
  workflow content pinning, runs table, pending_retain.
- Tier 4 DEFER: SSH workers, HTTP API auth, trace_index, PhaseUAT —
  build when a deployment demands it.
- Tier 5 DROP: items from late adversarial-review iterations that
  don't earn their keep (workflow_pins separate table, snap_ columns,
  agent_capabilities separate index).

Includes a recommended ~6-8 week v3.0 schedule and four decision
points that should be settled before starting work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:33:07 +02:00
Mikael Hugo
57a1bc6505 docs: import sf v3 spec from singularity-crush, annotated for status
Imports SPEC.md (v1.0-draft) from singularity-ng/crush#docs/spec — the
forward-looking contract for sf v3. Annotated section-by-section and
item-by-item with implementation status against current sf:

- EXISTS — already implemented in sf, matches the spec
- PARTIAL — implemented but diverges from spec; needs alignment work
- NEW — not yet implemented

Conformance breakdown (123 items total):
- 37 EXISTS
- 30 PARTIAL
- 56 NEW

The NEW items concentrate in: persistent-agent inbox model (§17/§18),
Singularity Memory integration (§16/§24), SSH worker extension (§22),
several supervisor refinements (§9), and policy/operations details
(audit fields, trace metadata, version pinning) introduced during the
v0.x adversarial review iterations.

The PARTIAL items concentrate in: schema reconciliation (sf has 3
tables — milestones/slices/tasks — vs spec's single units table),
config schema alignment, runs-table unification with audit_events,
and several worker-attempt lifecycle details that exist in different
shapes today.

This is an informational import. Implementing v3 against this spec
is its own work; the next step is deciding which NEW items are
actually wanted vs deferred, and whether to migrate the 3-table
planning schema to the single-units shape or keep what sf has and
update the spec.

Spec source: https://github.com/singularity-ng/crush/blob/docs/spec/SPEC.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:15:02 +02:00
Mikael Hugo
6eaf5926ad sf snapshot: uncommitted changes after 248m inactivity 2026-04-28 21:10:17 +02:00
Mikael Hugo
d30d91bf2f sf snapshot: uncommitted changes after 41m inactivity 2026-04-28 17:01:26 +02:00
Mikael Hugo
5d3c204006 fix(git-merge): no auto-flip from approved to declined; cached approval is sticky
Codex-rescue output (a299c461 / bnr88iy59) — the 'Git merge approved once'
followed seconds later by 'Git merge declined by user' bug we hit on
M002 complete-milestone. Same gate, same agent run, opposite verdicts.

Single source of truth for the merge-gate state in guardrails/index.ts.
Approval is now sticky — re-asks return the cached approval until consumed
or explicitly revoked, never auto-flip to decline. Timeout converts to
pause+log instead of decline. Adds tests/safe-git-merge-gate.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 16:20:08 +02:00
Mikael Hugo
d38e5ea092 fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests
Two codex-rescue tasks landed together:

1. Auto-coerce JSON-schema validator: when a tool field declares
   {type:"array", items:{type:"string"}} and the model sends a single
   string, wrap it in [string] before validation instead of hard-rejecting.
   Fixes the recurring "keyDecisions: must be array" rejection on
   sf_complete_task that wasted retries.

2. Provider_model_allow filter (proper implementation with helpers):
   - resolveProviderModelAllowList / isProviderModelAllowed /
     filterModelsByProviderModelAllow helpers in preferences-models
   - Wired into model-registry and auto-model-selection
   - New tests/provider-model-allow.test.ts

Tools coerced: sf_complete_task, sf_complete_milestone, sf_plan_milestone,
sf_plan_slice, sf_replan_slice, sf_reassess_roadmap (key list fields).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 12:30:55 +02:00
Mikael Hugo
f98a1e360e batch: codex-rescue session output (multiple in-flight tasks)
Combined output of multiple parallel codex-rescue runs that produced
working-tree edits but didn't commit. Tasks contributing:

- prefs: per-provider model allow-list (provider_model_allow) — manual
- TUI scroll + unresponsive (a7884d1a / bt3fpn4y2)
- planningMeeting required (aa09e904 / br127l763)
- Logs UX 4-pack (a5c65314 / btcplhu7f)
- Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp)
- sf_task_complete atomic + retry (a7a079b4 / b20cy5owv)
- Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98)
- Per-role slice prompts (a94c3e1a)
- Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl)
- Schema sweep (ac994b1e / task-moifq7pu-83coqz)
- Flow audit (ad26ecfd / bttj4vrqm)

Typecheck passes. Tests not run as a full suite — spot-check after merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 11:52:42 +02:00
Mikael Hugo
66ff949c11 cherry-pick(security): harden project-controlled surfaces (PR #4755 partial)
Cherry-pick of gsd-build/gsd-2 65ca5aa2e — applies the security hardening
hunks that conflicted minimally:

- mcp-server/env-writer: validate writes against a strict allowlist
- web/api/files: enforce path containment via web/lib/secure-path
- vscode-extension: read binaryPath/autoStart only from trusted
  global/default scopes (resolveTrustedSfStartupConfig), avoiding
  workspace-controlled override (renamed Gsd → Sf for sf naming)
- New regression tests: mcp-client-security, vscode-startup-security,
  web-files-symlink

Skipped hunks (drifted): mcp-server/server.ts, mcp-client/index.ts,
mcp-server/README.md.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:37:07 +02:00
Mikael Hugo
bf727173e7 cherry-pick(file-lock): make file-lock actually lock and throw on contention
Cherry-pick of gsd-build/gsd-2 a09e01640 — withFileLockSync now actually
acquires a proper-lockfile (was previously a no-op when proper-lockfile
wasn't required) and throws on ELOCKED contention by default. Adds
onLocked: 'skip' option for best-effort callers that tolerate dropped
entries (audit, journal). Modernizes import style (createRequire/join
from imports rather than ad-hoc require). Path-renames preserved
(gsd-pi → sf-run).

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:28:36 +02:00
Mikael Hugo
22d4579690 cherry-pick(state): lock-wrapped appends for journal, audit, workflow-logger
Cherry-pick of gsd-build/gsd-2 53babec29 — lock-wrapped append half.
Wraps appends to .sf/journal/, .sf/audit/events.jsonl, and the
workflow-logger error log in withFileLockSync (onLocked: skip),
preserving best-effort semantics while preventing torn writes
under contention.

Companion to the atomic-write half landed in 3df56cb94. Path-renames
(gsdRoot→sfRoot, gsd-db→sf-db) preserved during conflict resolution.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:27:44 +02:00
Mikael Hugo
f1f4b840e1 cherry-pick(doctor): self-heal symlinked .sf staging to prevent silent data loss
Cherry-pick of gsd-build/gsd-2 9340f1e9b (#4423) — doctor self-heal
detection for symlinked staging directories that can cause silent
data loss. Skips native-git-bridge.ts and git-service test (drifted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:56 +02:00
Mikael Hugo
7fd4672e55 cherry-pick(auto): handle worktree context fallback + sanitize paused session paths
Cherry-pick of gsd-build/gsd-2 a4f78731f — handles worktree context fallback
and sanitizes paths in paused session resumption. Skips uok-plan-v2-wiring
test hunk (drifted in sf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:40 +02:00
Mikael Hugo
93402643f4 cherry-pick(sf-db): tolerate corrupt task arrays in milestone rows
Cherry-pick of gsd-build/gsd-2 851507913 (#4056) — defensive parsing
so a corrupt or non-array tasks blob in a milestone row doesn't crash
sf-db reads. Test hunk skipped (sf-db.test.ts has drifted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:21 +02:00
Mikael Hugo
3df56cb94f cherry-pick(state): atomic-writes for guided-flow-queue and reports
Cherry-pick of gsd-build/gsd-2 53babec29 (Jeremy <jeremy@fluxlabs.net>)
— atomic-write half only. Eliminates torn-write risk on PROJECT.md
queue sync and reports.json/HTML index regeneration by switching
writeFileSync → atomicWriteSync (tmp+rename).

The companion lock-wrapped-append changes (journal.ts, uok/audit.ts,
workflow-logger.ts) are deferred — they need proper-lockfile +
withFileLockSync helper introduced first.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:16:39 +02:00
Mikael Hugo
8e827147c9 feat(code-intelligence): add sift indexer backend alongside project-rag
Generalize the code-intelligence hook to support multiple indexer
backends, with sift (rupurt/sift) as a new option next to the existing
project-rag MCP server. Backend is selected via CodebaseMapPreferences.

- code-intelligence.ts: new abstraction + sift backend (detect, resolve,
  status, context-block contribution)
- preferences-types.ts: codebaseIndexer field (project-rag | sift | none)
- preferences-validation.ts: validate the new field
- bootstrap/system-context.ts, commands-codebase.ts: dispatch on backend
- tests/code-intelligence.test.ts: sift detection/resolution/status tests
  (19 pass, 0 fail)

project-rag path unchanged and continues to work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:05:26 +02:00
Mikael Hugo
0606983d97 feat(subagent): add background job manager and tests
SubagentBackgroundJobManager tracks long-running subagent jobs with
status, abort support, and TTL-based eviction of completed results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 04:18:17 +02:00
Mikael Hugo
efd5e14e0a feat: add FEATURES.md capability map and generator
Human-oriented documentation of SF capabilities, with a script that
keeps it in sync with workflow-tools.ts and extension manifests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 04:18:12 +02:00
Mikael Hugo
25797129e2 sf snapshot: pre-dispatch, uncommitted changes after 38m inactivity 2026-04-28 00:21:39 +02:00
Mikael Hugo
0d286b991b sf snapshot: pre-dispatch, uncommitted changes after 2902m inactivity 2026-04-27 23:42:51 +02:00
Mikael Hugo
260d50a823 docs: warn against Python for managed-resources hash; causes resync hang
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 23:20:15 +02:00
Mikael Hugo
f0da5b6d21 fix: bind getProviderAuthMode to registry instance to avoid undefined 'this'
Extracting a class method as a bare reference loses its 'this' context,
causing 'Cannot read properties of undefined' when minimax (or any
provider) triggers the flat-rate auth-mode lookup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 19:22:39 +02:00