singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	6aa631c17a	Apply shared safe ID validation	2026-04-30 07:56:13 +02:00
Mikael Hugo	1a0c458ac4	Harden SF safe path validation	2026-04-30 07:55:07 +02:00
Mikael Hugo	cd69e85608	Harden SF model routing and harness contracts	2026-04-30 07:41:24 +02:00
Mikael Hugo	37c5db3dd3	test: Add verification gate integration tests for failure catching, cle… - src/resources/extensions/sf/tests/verification-gate.test.ts SF-Task: S03/T02	2026-04-30 06:40:54 +02:00
Mikael Hugo	a45f873124	chore: snapshot WIP before resuming M004/S03 auto 84 files spanning provider capabilities, model routing, headless runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted so headless auto can resume M004 (Production Readiness) S03 (Verification Gate Validation) on a clean tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:31:19 +02:00
Mikael Hugo	3d3a8e26e3	fix(sf): tighten mimo and openrouter model policy	2026-04-29 21:49:49 +02:00
Mikael Hugo	9c4bf9b3e6	fix(sf): use live ollama k2.6 routes	2026-04-29 21:38:51 +02:00
Mikael Hugo	f78c3fb2b8	fix(sf): keep kimi versions exact	2026-04-29 21:17:00 +02:00
Mikael Hugo	ab57548f2b	fix: keep skipped tasks out of slice verification	2026-04-29 20:37:56 +02:00
Mikael Hugo	d6fc1211b7	fix: auto-skip stale instruction-conflict tasks	2026-04-29 20:33:06 +02:00
Mikael Hugo	46174c1183	fix: block stale staging task dispatch	2026-04-29 20:25:39 +02:00
Mikael Hugo	120d7deda8	fix: keep headless alive for provider auto-resume	2026-04-29 20:16:23 +02:00
Mikael Hugo	db41f92812	fix: stage declared untracked task files	2026-04-29 20:15:35 +02:00
Mikael Hugo	9398c7000d	fix: route bare model families canonically	2026-04-29 20:15:28 +02:00
Mikael Hugo	aa70e1db56	fix: make auto recovery evidence-driven	2026-04-29 19:45:43 +02:00
Mikael Hugo	2ed1638153	fix: add headless heartbeat output	2026-04-29 19:29:43 +02:00
Mikael Hugo	93c1bbcb9a	docs: plan judge calibration service	2026-04-29 18:28:45 +02:00
Mikael Hugo	0d6eca9cdd	fix: preserve subagent debate mode details	2026-04-29 17:50:26 +02:00
Mikael Hugo	b32fe7acd1	docs: clarify SF harness rollout boundaries	2026-04-29 17:47:51 +02:00
Mikael Hugo	d78c5ac198	feat: add SF skills and subagent debate mode	2026-04-29 17:44:30 +02:00
Mikael Hugo	d02d33aa70	feat: add repo harness profiler	2026-04-29 17:39:52 +02:00
Mikael Hugo	a611db9032	docs: specify repo-native harness evolution	2026-04-29 17:23:39 +02:00
Mikael Hugo	ffa216d6ad	docs: log caveman input-compression follow-ups in BUILD_PLAN Caveman skill (output compression) installed at ~/.claude/skills/caveman/ and activated for dr-repo. Two follow-ups for INPUT-side compression remain — sf's own prompts are verbose (execute-task alone has 10-step instructions, runtime context, multiple inlined plans), and that's paid on every dispatch: - Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in caveman style. Preserve intent + nuance, drop fluff. Compare against current to confirm no quality regression. - Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt through caveman-compress (sub-skill, ~46% reduction) before dispatch. Behind a terse_prompts: true flag. Adds drift risk vs authored intent; needs comparison harness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:46:32 +02:00
Mikael Hugo	fb4885b757	prompt(execute-task): add parallel-tool-call rule Adds step 0a: when independent reads/greps are needed, batch them in a single assistant turn instead of one-at-a-time. The existing step 0 already pushed for terse narration, but didn't address the bigger waste — sequential tool calls when parallel would work. Common case: reading handler + test + schema to triangulate a bug — three reads in one turn, not three turns. Also nudges away from "talking-then-doing": if the next action is unambiguous, just take it. Describing intent before every call is the dead weight that adds up to 30-50% extra round-trips. Behavior fix only (prompt-level). Model can still narrate inside its thinking channel since that's a model property; this targets the chat/tool-use channel where the user pays per turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:42:22 +02:00
Mikael Hugo	c5df4b46a6	fix(headless): await auto loop in headless mode	2026-04-29 15:37:17 +02:00
Mikael Hugo	df614a3e47	fix(headless): split idle-timeout role from deadlock-backstop role The single IDLE_TIMEOUT_MS constant was conflating two different jobs: "are we done?" vs "is the agent stuck?". For multi-turn commands (auto, next, discuss, plan), the first question is wrong — those signal completion explicitly via "auto-mode stopped" terminal notifications, and child-process exit catches crashes. The 120s I'd just bumped multi-turn to was still in idle-detection mindset; that's not what we need from this timer. New semantics: - IDLE_TIMEOUT_MS = 15s — quick commands (status, queue, …); idle really does mean done. - NEW_MILESTONE_IDLE_TIMEOUT_MS = 120s — bounded creative task with pauses for thinking between bootstrap steps. - MULTI_TURN_DEADLOCK_BACKSTOP_MS = 30 minutes — auto/next/discuss/plan. Not a "done" detector; a deadlock recovery bound. Long enough to never bother slow LLM reasoning or chained tool calls; short enough to recover from a true hang within a reasonable window. Real completion comes from terminal notifications + child-process exit, both already wired. Code reads cleaner too: effectiveIdleTimeout selection now mirrors the three-way conceptual split. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:18:58 +02:00
Mikael Hugo	c239ad6c9d	fix(headless): use long idle timeout for auto/next/discuss/plan The 15s IDLE_TIMEOUT_MS was killing auto-mode prematurely. Symptom: sf headless auto would dispatch a task, the LLM would make 1-2 tool calls, pause to reason about the next step, exceed 15s of "no events", and headless would declare "Status: complete" — exiting at ~35s with the task barely started (123 events but only 2 tool calls). The 120s NEW_MILESTONE_IDLE_TIMEOUT_MS already exists for the same reason ("LLM may pause between tool calls e.g. after mkdir, before writing files"). The same applies to auto/next/discuss/plan — all multi-turn commands where the LLM thinks longer between actions, especially on non-trivial tasks. isMultiTurnCommand was already defined for related logic; this just wires it into the idle-timeout decision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:13:43 +02:00
Mikael Hugo	6e342a8875	fix(sf-from-source): switch from bun to node — clean from-source path bun was the wrong runtime for our environment, two ways: 1. bun doesn't ship node:sqlite. sf-db.ts falls back through node:sqlite → better-sqlite3 → null. Result: 'No SQLite provider available' and degraded-mode filesystem-state derivation, even though sqlite is actually available (node:sqlite under node, bun:sqlite under bun — both valid, but our code only knows the node names). 2. bun's loader doesn't inherit the system library search path under Nix. libz.so.1 isn't found for forge_engine.node, so the native addon falls through to JS implementations (slower). Both warnings ("Native addon not available", "DB unavailable — degraded mode") were the symptom of "we're running under bun". Fix: use node + the existing src/resources/extensions/sf/tests/ resolve-ts.mjs loader hook (which already handles .js → .ts import-specifier remapping for runtime resolution) + --experimental-strip-types (node 22+, native in 24). Result: from-source via node loads cleanly. No native warning. No sqlite warning. No degraded mode. Exec: `./bin/sf-from-source --print "..."` returns the model output and nothing else. Drops the LD_LIBRARY_PATH zlib-injection hack that was added in `4912f6ea8` — that was working around the bun native-loader issue that doesn't exist under node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 15:07:24 +02:00
Mikael Hugo	2afe2ac6f1	feat(prefs): self-aligning template upgrades — sf keeps its own files synced Companion to the earlier schema-versioning framework. Where that handles data-shape evolution via forward migrations, this handles file-template evolution via silent self-rewrite. The user shouldn't have to know: - ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the frontmatter when seeding a new project's PREFERENCES.md, recording the sf version that wrote the template. - New module preferences-template-upgrade.ts: - detectTemplateDrift(prefs) — pure check, returns { fromVersion, toVersion, needsUpgrade }. - upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders the file's frontmatter when fromVersion ≠ toVersion. Body (anything after the closing `---`) is preserved verbatim, so user notes stay. - Wired into loadPreferencesFile() — every read self-aligns. No human warnings, no opt-in flow; sf keeps its own house in order. - last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so it round-trips through validatePreferences without "unknown key" warnings. Failure modes are non-fatal: missing file, malformed frontmatter, or read-only filesystem all leave the file alone and return the in-memory prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry points don't stamp incorrect values. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:05:37 +02:00
Mikael Hugo	4912f6ea80	fix(sf-from-source): inject Nix-store zlib into LD_LIBRARY_PATH bun's loader doesn't inherit the same library search path as node under Nix, so require('forge_engine.linux-x64.node') fails with 'libz.so.1: cannot open shared object file' even when the native addon exists at the expected path. Result: sf-from-source ran in JS-fallback mode, and we'd been working around it by switching to node dist/loader.js — which forces a manual `npm run copy-resources` after every src/ change to keep dist in sync. This wraps sf-from-source to find a Nix-store zlib at startup and prepend it to LD_LIBRARY_PATH before exec'ing bun. The native addon loads cleanly; from-source becomes the reliable default again; no more dist drift to worry about. Find pattern: /nix/store/-zlib-/lib/libz.so.1 at maxdepth 4 (maxdepth 2 was too shallow — the hash dir is depth 1, lib is depth 2, the .so.1 file is depth 3, plus we want the parent dir for LD_LIBRARY_PATH so '%h' on a depth-3 match gives the lib dir). Outside Nix (no /nix/store), this is a no-op and falls through to the existing exec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 15:01:55 +02:00
Mikael Hugo	a2b709f669	fix(gitignore): write sf runtime patterns to .git/info/exclude, not .gitignore ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's .gitignore on every sf run, causing two issues: 1. Working-tree churn — every invocation dirtied .gitignore, forcing a commit just to silence "uncommitted changes" warnings. Pattern flagged by user: "is this the right way with its own every run". 2. False-positive duplicate-add — the literal-string check (`existingLines.has(".sf")`) didn't recognize user-equivalent patterns like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit user entry got duplicated by the auto-add on next run. Fix: move sf-specific runtime patterns to `.git/info/exclude` via new `ensureGitInfoExclude()`. That file is per-clone (not committed), so re-writing is invisible to git status. The project's `.gitignore` stays human-curated and sf doesn't opinionate on it. `ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers don't need to update — backwards compatible. Generic OS/IDE/lang patterns (.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for .gitignore since those genuinely belong in version control. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 14:58:14 +02:00
Mikael Hugo	6031106d93	docs: add UPSTREAM_PORT_GUIDE.md — translation rules for gsd-2 → sf ports We sync from two upstreams (pi-mono via cherry-pick, gsd-2 via manual port) and the gsd-2 syncs hit naming/path translation every time. This guide makes the translation rules explicit and persistent so future ports (by humans or by sf) don't have to rediscover them. Covers: - The naming translations table: gsd_* → sf_, .gsd/ → .sf/, extensions/gsd/ → extensions/sf/, @sf-run/ → @singularity-forge/*, GSD_HOME → SF_HOME, etc. - Default rule: translate naming, keep substance. Includes the cautionary tale of my own self-heal rejection (`1bbd20bf7`) where I wrongly skipped a fix because of the path string. - When a port REALLY doesn't apply (architectural divergence vs naming divergence) — three categories with examples. - Mechanics for pi-mono (cherry-pick) vs gsd-2 (manual) ports. - Skip-list documentation: when you reject, document why in BUILD_PLAN with the upstream SHA and reason. - Prompt-edit handling: gsd_<verb> → sf_<verb>, register tools before porting prompt edits that call them. Future automation hint at the bottom for a port-translation script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:51:19 +02:00
Mikael Hugo	1bbd20bf78	docs: correct gsd-2 self-heal port — substance applies after path translation Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked .gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use .sf instead'. That was a superficial read. The fix is about detecting and recovering from a broken/redirected staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is a one-line path translation, not a design difference. The symlink-resilience logic is exactly what we need for our staging. Path-translate .gsd/ → .sf/ in the port. The substance ports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:49:05 +02:00
Mikael Hugo	9a7d6b7d98	chore(test): drop systemd-run wrapper from test:sf-light The wrapper imposed CPUQuota=200% / MemoryMax=4G via a transient scope unit, which requires polkit interactive auth and silently failed on non-TTY hosts (the script then exit-0'd without running tests). The limits were a guard against the heavy test:coverage runner's worker saturation, but test:sf-light already runs in-process with --max-old-space-size=2048 and --test-timeout=30000 — the systemd governor was overkill for this lighter target and incompatible with headless / non-laptop environments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 14:47:50 +02:00
Mikael Hugo	9b718f8e36	fix(headless): repair missing sf project symlink	2026-04-29 14:43:30 +02:00
Mikael Hugo	3b6cbcd79f	feat(prefs): schema versioning with forward-migration registry Adds the framework for evolving the prefs schema without silently breaking projects pinned to older versions. Each PREFERENCES.md declares `version: N`; sf declares CURRENT_PREFERENCES_SCHEMA_VERSION in code. On load: - prefs.version === current → no-op - prefs.version < current → run registered migrations in chain (forward only, pure functions). Missing migration in the chain throws — bumping the schema version requires a matching Migration entry, by construction. - prefs.version > current → warn "prefs from a newer sf, fields may be ignored", preserve the value so a later upgrade reads correctly. - prefs.version undefined → assume v1 (legacy file pre-versioning) and warn so the user adds an explicit pin. Migration registry is empty for now (current schema version stays at 1) — the framework is in place so the first real schema bump is a one-line addition, not a refactor. Drift detection (`checkPreferencesDrift`) is also the natural surface for future deprecated-key / missing-required-field checks when CLAUDE.md / template comparisons are added. Wired into validatePreferences() so every load path gets the new behavior automatically — no caller changes needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 14:38:43 +02:00
Mikael Hugo	dea4c2dbc1	docs: update Tier 0 with port status; flag SSE parser refactor as bigger work 5 of 9 Tier 0 items landed: - #1 HTML export escape (security) `701ec8fb8` + `92c6d933c` - #2 Empty tools array fix `58b1d7c60` - #4 undici 5min timeout `d0907b6d8` - #5 Bedrock inference profile `7c487bb60` Deferred: - #3 Anthropic SSE proxy event tolerance — fix applies to pi-mono's custom SSE parser, but we still use @anthropic-ai/sdk directly. To get protection we'd need to port the full "own Anthropic SSE parsing" refactor (3 commits, ~200 LOC). Added as a separate Tier 0 item. Remaining TODO from Tier 0: items #6-#9 (symlinked dedup, setWorkingVisible extension API, Cloudflare provider, Azure Cognitive Services). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:35:55 +02:00
Mikael Hugo	d0907b6d87	port(pi-mono): disable undici body/headers idle timeouts on global dispatcher (refs ea90a6783) Pi-mono Tier 0 #4 — manual port (sf went off-task; ported directly). undici's default 300s bodyTimeout aborts long local-LLM SSE streams (e.g. vLLM buffering a large tool call) with UND_ERR_BODY_TIMEOUT. retry.provider.timeoutMs cannot lift this cap — it controls the provider SDK's AbortController, not undici's per-socket idle timer. Pass {bodyTimeout: 0, headersTimeout: 0} to EnvHttpProxyAgent. Provider SDKs continue to enforce their own deadlines. Type-check passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:35:08 +02:00
Mikael Hugo	92c6d933ce	chore(pkg/dist): sync template.js with source after HTML escape port (refs `701ec8fb8`) pkg/dist/core/export-html/template.js is a tracked dist mirror that needs the same HTML escape fix as packages/pi-coding-agent/src/core/ export-html/template.js (committed in `701ec8fb8`). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:28:33 +02:00
Mikael Hugo	6248e79a7a	feat(init): auto-seed PREFERENCES.md with detected verification_commands Without this, every fresh project inherits sf's user-level dogfooding defaults (npm run typecheck:extensions, test:sf-light) — which run sf's own dev scripts against unrelated repos and produce universal false negatives. Hit in dr-repo (Go): T01-VERIFY.json showed all_fail because those npm scripts don't exist there, even though T01's actual work passed verification per its SUMMARY. - ensurePreferences() now calls detectProjectSignals() and embeds the auto-detected commands in the YAML frontmatter on first init. Detection failure is non-fatal — falls back to the bare template. - detectVerificationCommands() Go branch now handles multi-module repos (no root go.mod, only nested ones — common pattern for repos like dr-repo/{dr-agent,portal,gateway,installer,cmd/installer}). Generates a per-module loop instead of running go vet/test from the repo root, which would fail since each subdir is its own Go module. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 14:26:49 +02:00
Mikael Hugo	58b1d7c601	port(pi-mono): omit tools field instead of sending empty array (refs 3e0ee69b5) Pi-mono Tier 0 #2 — sf-driven port of PR #3650. Some LLM providers reject API calls when `tools: []` is sent (an empty array), but accept the call when the tools field is omitted entirely. This guards each provider's request-body builder to omit `tools` when the tool list is empty, instead of serialising the empty array. Files (5 provider builders): - packages/pi-ai/src/providers/openai-completions.ts - packages/pi-ai/src/providers/openai-responses.ts - packages/pi-ai/src/providers/openai-codex-responses.ts - packages/pi-ai/src/providers/azure-openai-responses.ts - packages/pi-ai/src/providers/anthropic-shared.ts (covers anthropic and anthropic-vertex which both import buildParams from it) Pattern: `if (context.tools)` → `if (context.tools && context.tools.length > 0)`. Preserved: the `else if (hasToolHistory(context.messages))` branch in openai-completions.ts that intentionally emits `tools: []` for LiteLLM/Anthropic-proxy compatibility is unchanged. Type-check passes. Co-Authored-By: sf v2.75.1 (session 38ed0a48) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:22:31 +02:00
Mikael Hugo	701ec8fb88	port(pi-mono): escape session metadata + image data in HTML export (refs 7617c1ad9, 57787b655) Pi-mono Tier 0 #1 (security) — sf-driven port. Two upstream security fixes (pi-mono PR #3819, #3883) that escape user-controlled session content before embedding in HTML exports. Crafted session content (image mime types, image data, model IDs, tool names, entry IDs) could otherwise inject markup at the export boundary. What sf changed in packages/pi-coding-agent/src/core/export-html/template.js: - Image tags: escape `mimeType` and `data` attributes for both tool-result and user-message image renders (PR #3819). - Session metadata: escape `msg.toolName`, `msg.role`, `entry.modelId`, `entry.thinkingLevel`, `entry.type`, `entry.id`, and `globalStats.models` (PR #3883). - DOM id construction: renamed `entryId` → `entryDomId` and escape `entry.id` to prevent attribute-breakout from a crafted id. The existing `escapeHtml()` helper was used at every site; no new helper introduced. Type-check passes. Co-Authored-By: sf v2.75.1 (session 150fe2c1) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:20:23 +02:00
Mikael Hugo	7c487bb60e	port(pi-mono): normalize Bedrock model names for inference profiles (refs ed4bc7308) Pi-mono Tier 0 #5 — first sf-driven port. sf-from-source dispatched the task in print mode and produced this fix autonomously. Adds getModelMatchCandidates(modelId, modelName?) helper that normalizes both inputs to lowercase and dash-separated form (s.replace(/[\s_.:]+/g, "-")). Inference profile ARNs don't embed the model name; the helper lets capability checks match against either the inference profile ARN or the underlying model name. Updated: - supportsAdaptiveThinking — uses the helper; consolidates the opus-4.6/opus-4-6 dot-vs-dash variants. - mapThinkingLevelToEffort — same pattern. - supportsPromptCaching — same pattern (also from pi-mono PR #3527). - streamSimpleBedrock and buildAdditionalModelRequestFields — pass model.name through to capability checks. Type-check passes (cd packages/pi-ai && npx tsc --noEmit). Co-Authored-By: sf v2.75.1 (session 911dd2de) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:14:17 +02:00
Mikael Hugo	a3c487c918	docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence): 9 items ranked security → bug-fixes → infra → features. Critical: 1. HTML export escape (security) 2. Empty tools array fix (provider compatibility) 3. Anthropic SSE proxy event tolerance 4. Long local-LLM SSE 5min timeout fix Infrastructure: 5. Bedrock inference profile normalization 6. Symlinked packages dedup 7. ctx.ui.setWorkingVisible() extension API Features: 8. Cloudflare Workers AI provider 9. Azure Cognitive Services endpoint Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace): Critical fixes (11): 1-6. bash race, security hardening, web_search injection narrowing, symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock 7-10. agent_end transition fixes (4 commits) 11. claude-code-cli Always-Allow persistence Normal-value features (6): 12. /gsd eval-review slim port (prompt + tool + template) 13. Workflow state machine hardening (5 commits as unit) 14. Proactive rate limiting (min_request_interval_ms) 15. Per-call token telemetry (opt-in pi-coding-agent hooks) 16. Worktree TUI commands 17. Doctor check for orphan milestone directories Skipped from each upstream is documented. All in BUILD_PLAN.md so sf can work the list systematically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:04:31 +02:00
Mikael Hugo	bb1c68b7ab	docs: drop OpenRouter-removal follow-up OpenRouter is already neutered via the provider_model_allow allowlist (see `d38e5ea09` fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests). The 248 model entries in models.generated.ts are inert — no dispatch path reaches them. Removing the data entries would be aesthetic cleanup with zero behavioral effect. Not worth a Tier-1 follow-up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:58:33 +02:00
Mikael Hugo	310ce963ea	docs: add session follow-ups to BUILD_PLAN Six items surfaced during 2026-04-29 ports/refactors that didn't get tracked anywhere: - Tier 1: Remove OpenRouter (~248 model entries; user confirmed unused) - Tier 1: Minimax search tests (deferred from initial port) - Tier 2: Search provider registry refactor (rid of 9-file-per-provider) - Tier 2: Product-audit phase machine wire-up (slim port shipped tool; phase dispatch not yet wired) - Tier 2: Headless assistant-text preview (bunker pattern, deferred from headless UX commit) - Tier 3: Pi-mono SDK sync cadence Each entry has rationale + effort estimate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:56:55 +02:00
Mikael Hugo	a8cf2cd941	feat(workflow): add product-audit (slim port) Milestone-end workflow that compares declared product intent (VISION.md, RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and emits structured gaps with severity. Soft gates — adds follow-up slices but doesn't hard-block merge. Slim port (4 new files + 1 registration) — extracts only the audit feature itself, not bunker's parallel rewrite of dispatch/prompts/ benchmark-selector that came with it in commit 2aa785475. Created: - prompts/product-audit.md — prompt verbatim, gsd_→sf_ and .gsd→.sf - tools/product-audit-tool.ts — slim file-write implementation, atomicWriteAsync to .sf/active/{mid}/ PRODUCT-AUDIT.{json,md}; no DB deps - bootstrap/product-audit-tool.ts — pi-coding-agent tool registration, TypeBox schema for sf_product_audit - workflow-templates/product-audit.md — workflow template Modified: - bootstrap/register-extension.ts — 2 lines: import + add to nonCriticalRegistrations - workflow-templates/registry.json — registry entry - package.json — version 2.75.0 → 2.75.1 Verdict logic (no-gaps \| gaps-found \| contract-underspecified) is the load-bearing innovation: contract-underspecified forces the auditor to flag unverifiable docs as a real gap rather than rubber-stamping no-gaps when the product contract is silent. Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase machine is a follow-up; the prompt + tool + template stand alone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:55:23 +02:00
Mikael Hugo	2eebeccb93	feat(search): add MiniMax web search provider New search backend alongside tavily/brave/serper/exa/ollama. API key resolution: MINIMAX_CODE_PLAN_KEY → MINIMAX_CODING_API_KEY → MINIMAX_API_KEY (fallback order matches MiniMax's documented aliases). Wired through every existing seam: - type union: SearchProvider = 'tavily' \| 'minimax' \| 'brave' \| 'ollama' - VALID_PREFERENCES set + selection logic in provider.ts - native-search routing (Anthropic native web_search delegates correctly) - /search-provider CLI command (tab completion, select UI, parser) - tool-search.ts: search execution path - tool-llm-context.ts: prefetch / context-builder path - preferences-types + preferences-validation - configuration.md user docs - extension-manifest description Tests not added in this commit — the bunker reference tests don't match our preferences/provider export shape (we have serper/exa/combosearch that bunker doesn't). Tests for getMiniMaxSearchApiKey priority order, resolveSearchProvider returning "minimax", /search-provider minimax CLI behavior, no-key error messages, and executeMiniMaxSearch request shape are TODO. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:55:04 +02:00
Mikael Hugo	ae0bbe32fc	feat(providers): add xiaomi direct API (token-plan-{ams,sgp,cn}) — additive Adds direct xiaomi token-plan API access alongside the existing OpenRouter-routed xiaomi entries. ADDITIVE only — OpenRouter cleanup is a separate follow-up. Three new region providers: - xiaomi-token-plan-ams (Amsterdam, default for plain `xiaomi`) - xiaomi-token-plan-sgp (Singapore) - xiaomi-token-plan-cn (China) All use Anthropic Messages API. Env-var resolution: XIAOMI_API_KEY → XIAOMI_TOKEN_PLAN_API_KEY → MIMO_API_KEY (in that fallback order). Three xiaomi MiMo models registered under each direct provider: - mimo-v2-flash (256k ctx, 64k output, text-only, reasoning) - mimo-v2-omni (256k ctx, 128k output, text+image, reasoning) - mimo-v2-pro (1M ctx, 128k output, text-only, reasoning) Same model literals × 4 provider keys, different baseUrls per region. Test count assertion bumped 22 → 26 providers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:54:43 +02:00
Mikael Hugo	dff0df5fdc	fix(headless): suppress notification spam, categorize messages, distinguish phase vs status Three small UX fixes for headless / autopilot logs: 1. Add `zz-notifications` to TUI_FOOTER_STATUS_KEYS — these are sticky notification dots from the interactive TUI footer; they have no meaning in headless and were spamming the log. 2. Categorize notification messages by prefix so headless output is scannable: [mcp] for MCP-client-ready, [search] for web search status, [parallel] for slice-parallel/subagent dispatch. Falls through to the existing important/non-important formatting for everything else. 3. Distinguish phase transitions from generic status updates: phase:/ milestone:/slice:/task: prefixed keys get [phase]; everything else gets [status]. Previously both used [phase], which was misleading. Patterns based on bunker commits 14ec4d97f / c15afb45f (which were the research source) but written fresh against our existing TUI_FOOTER_STATUS_KEYS structure rather than cherry-picked. The assistant-text-preview commit (cf0274c63) is a separate, larger refactor in headless.ts and is deferred to v3.1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:43:40 +02:00

1 2 3 4 5 ...

3671 commits