Commit graph

2289 commits

Author SHA1 Message Date
Mikael Hugo
2111da8e60 sf snapshot: pre-dispatch, uncommitted changes after 53m inactivity 2026-04-30 19:10:38 +02:00
Mikael Hugo
40e0835d5e test: Add unit tests for triage routing and edge cases in commands-todo…
- src/resources/extensions/sf/tests/commands-todo.test.ts

SF-Task: S01/T02
2026-04-30 18:16:43 +02:00
Mikael Hugo
e90298f2e0 sf snapshot: pre-dispatch, uncommitted changes after 120m inactivity 2026-04-30 17:44:03 +02:00
Mikael Hugo
d8a9d63c87 feat: Replaced bare error writes in cli.ts, headless.ts, and startup-mo…
- src/cli.ts
- src/headless.ts
- src/startup-model-validation.ts

SF-Task: S04/T03
2026-04-30 15:43:29 +02:00
Mikael Hugo
8677e73046 sf snapshot: pre-dispatch, uncommitted changes after 97m inactivity 2026-04-30 15:11:45 +02:00
Mikael Hugo
b26dca40ec fix: Stop milestone completion git archaeology 2026-04-30 13:34:24 +02:00
Mikael Hugo
0f27ffe865 fix: Let safe smoke tasks use LLM approval 2026-04-30 13:11:26 +02:00
Mikael Hugo
6a33357df5 fix: Add production mutation approval gate 2026-04-30 12:17:35 +02:00
Mikael Hugo
08ea92b072 fix: Harden auto recovery and production guards 2026-04-30 11:35:16 +02:00
Mikael Hugo
62d430ab23 Add provider smoke benchmark and headless updates 2026-04-30 10:19:18 +02:00
Mikael Hugo
1dbd30c713 Fix Kimi Code K2.6 routing and pricing 2026-04-30 10:03:06 +02:00
Mikael Hugo
6ccce42c62 Add headless bootstrap and TODO triage tests 2026-04-30 09:21:24 +02:00
Mikael Hugo
e62b3854cb Prevent auto-commit after cancelled units 2026-04-30 09:07:44 +02:00
Mikael Hugo
8487507d1b Add TODO triage and validation recheck flow 2026-04-30 08:41:49 +02:00
Mikael Hugo
ed19fa1864 Complete SF safe ID remediation sweep 2026-04-30 08:08:10 +02:00
Mikael Hugo
f76504a038 Add runaway recovery handoff artifacts 2026-04-30 08:07:44 +02:00
Mikael Hugo
6aa631c17a Apply shared safe ID validation 2026-04-30 07:56:13 +02:00
Mikael Hugo
1a0c458ac4 Harden SF safe path validation 2026-04-30 07:55:07 +02:00
Mikael Hugo
cd69e85608 Harden SF model routing and harness contracts 2026-04-30 07:41:24 +02:00
Mikael Hugo
37c5db3dd3 test: Add verification gate integration tests for failure catching, cle…
- src/resources/extensions/sf/tests/verification-gate.test.ts

SF-Task: S03/T02
2026-04-30 06:40:54 +02:00
Mikael Hugo
a45f873124 chore: snapshot WIP before resuming M004/S03 auto
84 files spanning provider capabilities, model routing, headless
runtime, sf auto subsystems, gitbook docs, and test coverage. Snapshotted
so headless auto can resume M004 (Production Readiness) S03
(Verification Gate Validation) on a clean tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:31:19 +02:00
Mikael Hugo
3d3a8e26e3 fix(sf): tighten mimo and openrouter model policy 2026-04-29 21:49:49 +02:00
Mikael Hugo
9c4bf9b3e6 fix(sf): use live ollama k2.6 routes 2026-04-29 21:38:51 +02:00
Mikael Hugo
f78c3fb2b8 fix(sf): keep kimi versions exact 2026-04-29 21:17:00 +02:00
Mikael Hugo
ab57548f2b fix: keep skipped tasks out of slice verification 2026-04-29 20:37:56 +02:00
Mikael Hugo
d6fc1211b7 fix: auto-skip stale instruction-conflict tasks 2026-04-29 20:33:06 +02:00
Mikael Hugo
46174c1183 fix: block stale staging task dispatch 2026-04-29 20:25:39 +02:00
Mikael Hugo
db41f92812 fix: stage declared untracked task files 2026-04-29 20:15:35 +02:00
Mikael Hugo
9398c7000d fix: route bare model families canonically 2026-04-29 20:15:28 +02:00
Mikael Hugo
aa70e1db56 fix: make auto recovery evidence-driven 2026-04-29 19:45:43 +02:00
Mikael Hugo
0d6eca9cdd fix: preserve subagent debate mode details 2026-04-29 17:50:26 +02:00
Mikael Hugo
d78c5ac198 feat: add SF skills and subagent debate mode 2026-04-29 17:44:30 +02:00
Mikael Hugo
d02d33aa70 feat: add repo harness profiler 2026-04-29 17:39:52 +02:00
Mikael Hugo
fb4885b757 prompt(execute-task): add parallel-tool-call rule
Adds step 0a: when independent reads/greps are needed, batch them in a
single assistant turn instead of one-at-a-time. The existing step 0
already pushed for terse narration, but didn't address the bigger waste
— sequential tool calls when parallel would work. Common case: reading
handler + test + schema to triangulate a bug — three reads in one turn,
not three turns.

Also nudges away from "talking-then-doing": if the next action is
unambiguous, just take it. Describing intent before every call is the
dead weight that adds up to 30-50% extra round-trips.

Behavior fix only (prompt-level). Model can still narrate inside its
thinking channel since that's a model property; this targets the
chat/tool-use channel where the user pays per turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:42:22 +02:00
Mikael Hugo
c5df4b46a6 fix(headless): await auto loop in headless mode 2026-04-29 15:37:17 +02:00
Mikael Hugo
2afe2ac6f1 feat(prefs): self-aligning template upgrades — sf keeps its own files synced
Companion to the earlier schema-versioning framework. Where that handles
data-shape evolution via forward migrations, this handles file-template
evolution via silent self-rewrite. The user shouldn't have to know:

- ensurePreferences() now stamps `last_synced_with_sf: <semver>` in the
  frontmatter when seeding a new project's PREFERENCES.md, recording the
  sf version that wrote the template.
- New module preferences-template-upgrade.ts:
  - detectTemplateDrift(prefs) — pure check, returns
    { fromVersion, toVersion, needsUpgrade }.
  - upgradePreferencesFileIfDrifted(path, prefs) — silently re-renders
    the file's frontmatter when fromVersion ≠ toVersion. Body (anything
    after the closing `---`) is preserved verbatim, so user notes stay.
- Wired into loadPreferencesFile() — every read self-aligns. No human
  warnings, no opt-in flow; sf keeps its own house in order.
- last_synced_with_sf added to SFPreferences + KNOWN_PREFERENCE_KEYS so
  it round-trips through validatePreferences without "unknown key"
  warnings.

Failure modes are non-fatal: missing file, malformed frontmatter, or
read-only filesystem all leave the file alone and return the in-memory
prefs unchanged. SF_VERSION env var (set by loader.ts) is the source of
truth for "current sf"; "0.0.0" sentinel skips upgrade so atypical entry
points don't stamp incorrect values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:05:37 +02:00
Mikael Hugo
a2b709f669 fix(gitignore): write sf runtime patterns to .git/info/exclude, not .gitignore
ensureGitignore was re-adding `.sf`, `.sf-id`, `.bg-shell/` to the project's
.gitignore on every sf run, causing two issues:

1. Working-tree churn — every invocation dirtied .gitignore, forcing a
   commit just to silence "uncommitted changes" warnings. Pattern flagged
   by user: "is this the right way with its own every run".

2. False-positive duplicate-add — the literal-string check
   (`existingLines.has(".sf")`) didn't recognize user-equivalent patterns
   like `/.sf` (root-only) or `.sf/` (with trailing slash), so an explicit
   user entry got duplicated by the auto-add on next run.

Fix: move sf-specific runtime patterns to `.git/info/exclude` via new
`ensureGitInfoExclude()`. That file is per-clone (not committed), so
re-writing is invisible to git status. The project's `.gitignore` stays
human-curated and sf doesn't opinionate on it.

`ensureGitignore()` now calls `ensureGitInfoExclude()` first so callers
don't need to update — backwards compatible. Generic OS/IDE/lang patterns
(.DS_Store, node_modules/, target/, etc.) stay in BASELINE_PATTERNS for
.gitignore since those genuinely belong in version control.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:58:14 +02:00
Mikael Hugo
9b718f8e36 fix(headless): repair missing sf project symlink 2026-04-29 14:43:30 +02:00
Mikael Hugo
3b6cbcd79f feat(prefs): schema versioning with forward-migration registry
Adds the framework for evolving the prefs schema without silently breaking
projects pinned to older versions. Each PREFERENCES.md declares `version: N`;
sf declares CURRENT_PREFERENCES_SCHEMA_VERSION in code. On load:

- prefs.version === current → no-op
- prefs.version < current → run registered migrations in chain (forward only,
  pure functions). Missing migration in the chain throws — bumping the
  schema version requires a matching Migration entry, by construction.
- prefs.version > current → warn "prefs from a newer sf, fields may be
  ignored", preserve the value so a later upgrade reads correctly.
- prefs.version undefined → assume v1 (legacy file pre-versioning) and
  warn so the user adds an explicit pin.

Migration registry is empty for now (current schema version stays at 1) —
the framework is in place so the first real schema bump is a one-line
addition, not a refactor. Drift detection (`checkPreferencesDrift`) is also
the natural surface for future deprecated-key / missing-required-field
checks when CLAUDE.md / template comparisons are added.

Wired into validatePreferences() so every load path gets the new behavior
automatically — no caller changes needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:38:43 +02:00
Mikael Hugo
6248e79a7a feat(init): auto-seed PREFERENCES.md with detected verification_commands
Without this, every fresh project inherits sf's user-level dogfooding
defaults (npm run typecheck:extensions, test:sf-light) — which run sf's
own dev scripts against unrelated repos and produce universal false
negatives. Hit in dr-repo (Go): T01-VERIFY.json showed all_fail because
those npm scripts don't exist there, even though T01's actual work passed
verification per its SUMMARY.

- ensurePreferences() now calls detectProjectSignals() and embeds the
  auto-detected commands in the YAML frontmatter on first init. Detection
  failure is non-fatal — falls back to the bare template.
- detectVerificationCommands() Go branch now handles multi-module repos
  (no root go.mod, only nested ones — common pattern for repos like
  dr-repo/{dr-agent,portal,gateway,installer,cmd/installer}). Generates
  a per-module loop instead of running go vet/test from the repo root,
  which would fail since each subdir is its own Go module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 14:26:49 +02:00
Mikael Hugo
a8cf2cd941 feat(workflow): add product-audit (slim port)
Milestone-end workflow that compares declared product intent (VISION.md,
RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and
emits structured gaps with severity. Soft gates — adds follow-up slices
but doesn't hard-block merge.

Slim port (4 new files + 1 registration) — extracts only the audit
feature itself, not bunker's parallel rewrite of dispatch/prompts/
benchmark-selector that came with it in commit 2aa785475.

Created:
- prompts/product-audit.md         — prompt verbatim, gsd_*→sf_* and .gsd→.sf
- tools/product-audit-tool.ts      — slim file-write implementation,
                                     atomicWriteAsync to .sf/active/{mid}/
                                     PRODUCT-AUDIT.{json,md}; no DB deps
- bootstrap/product-audit-tool.ts  — pi-coding-agent tool registration,
                                     TypeBox schema for sf_product_audit
- workflow-templates/product-audit.md — workflow template

Modified:
- bootstrap/register-extension.ts  — 2 lines: import + add to nonCriticalRegistrations
- workflow-templates/registry.json — registry entry
- package.json — version 2.75.0 → 2.75.1

Verdict logic (no-gaps | gaps-found | contract-underspecified) is the
load-bearing innovation: contract-underspecified forces the auditor to
flag unverifiable docs as a real gap rather than rubber-stamping
no-gaps when the product contract is silent.

Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase
machine is a follow-up; the prompt + tool + template stand alone.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:23 +02:00
Mikael Hugo
2eebeccb93 feat(search): add MiniMax web search provider
New search backend alongside tavily/brave/serper/exa/ollama. API key
resolution: MINIMAX_CODE_PLAN_KEY → MINIMAX_CODING_API_KEY →
MINIMAX_API_KEY (fallback order matches MiniMax's documented aliases).

Wired through every existing seam:
- type union: SearchProvider = 'tavily' | 'minimax' | 'brave' | 'ollama'
- VALID_PREFERENCES set + selection logic in provider.ts
- native-search routing (Anthropic native web_search delegates correctly)
- /search-provider CLI command (tab completion, select UI, parser)
- tool-search.ts: search execution path
- tool-llm-context.ts: prefetch / context-builder path
- preferences-types + preferences-validation
- configuration.md user docs
- extension-manifest description

Tests not added in this commit — the bunker reference tests don't match
our preferences/provider export shape (we have serper/exa/combosearch
that bunker doesn't). Tests for getMiniMaxSearchApiKey priority order,
resolveSearchProvider returning "minimax", /search-provider minimax CLI
behavior, no-key error messages, and executeMiniMaxSearch request shape
are TODO.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:04 +02:00
Mikael Hugo
c41912ff55 fix(prompts): tell agents about Serena (repo-intelligence MCP) for code exploration
We have .serena/ configured (cache, memories, project.local.yml) but no
prompt mentioned Serena anywhere. Agents weren't using it for symbol
lookup or cross-file architecture mapping; they fell straight to rg/find.

Added a one-sentence Serena hint to the code-exploration step in:
- research-slice.md
- research-milestone.md
- plan-slice.md
- plan-milestone.md
- guided-research-slice.md

Phrased generically ("If a repo-intelligence MCP (e.g. Serena) is
configured...") so it degrades cleanly when Serena isn't set up.

Pattern based on bunker commit 4ba746888 but written fresh against our
post-rename prompt structure rather than cherry-picked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:41:33 +02:00
Mikael Hugo
b24f426f2b batch: snapshot of in-flight v2 work
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.

Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
  trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries

The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.

The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:42:31 +02:00
Mikael Hugo
6eaf5926ad sf snapshot: uncommitted changes after 248m inactivity 2026-04-28 21:10:17 +02:00
Mikael Hugo
d30d91bf2f sf snapshot: uncommitted changes after 41m inactivity 2026-04-28 17:01:26 +02:00
Mikael Hugo
5d3c204006 fix(git-merge): no auto-flip from approved to declined; cached approval is sticky
Codex-rescue output (a299c461 / bnr88iy59) — the 'Git merge approved once'
followed seconds later by 'Git merge declined by user' bug we hit on
M002 complete-milestone. Same gate, same agent run, opposite verdicts.

Single source of truth for the merge-gate state in guardrails/index.ts.
Approval is now sticky — re-asks return the cached approval until consumed
or explicitly revoked, never auto-flip to decline. Timeout converts to
pause+log instead of decline. Adds tests/safe-git-merge-gate.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 16:20:08 +02:00
Mikael Hugo
d38e5ea092 fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests
Two codex-rescue tasks landed together:

1. Auto-coerce JSON-schema validator: when a tool field declares
   {type:"array", items:{type:"string"}} and the model sends a single
   string, wrap it in [string] before validation instead of hard-rejecting.
   Fixes the recurring "keyDecisions: must be array" rejection on
   sf_complete_task that wasted retries.

2. Provider_model_allow filter (proper implementation with helpers):
   - resolveProviderModelAllowList / isProviderModelAllowed /
     filterModelsByProviderModelAllow helpers in preferences-models
   - Wired into model-registry and auto-model-selection
   - New tests/provider-model-allow.test.ts

Tools coerced: sf_complete_task, sf_complete_milestone, sf_plan_milestone,
sf_plan_slice, sf_replan_slice, sf_reassess_roadmap (key list fields).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 12:30:55 +02:00
Mikael Hugo
f98a1e360e batch: codex-rescue session output (multiple in-flight tasks)
Combined output of multiple parallel codex-rescue runs that produced
working-tree edits but didn't commit. Tasks contributing:

- prefs: per-provider model allow-list (provider_model_allow) — manual
- TUI scroll + unresponsive (a7884d1a / bt3fpn4y2)
- planningMeeting required (aa09e904 / br127l763)
- Logs UX 4-pack (a5c65314 / btcplhu7f)
- Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp)
- sf_task_complete atomic + retry (a7a079b4 / b20cy5owv)
- Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98)
- Per-role slice prompts (a94c3e1a)
- Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl)
- Schema sweep (ac994b1e / task-moifq7pu-83coqz)
- Flow audit (ad26ecfd / bttj4vrqm)

Typecheck passes. Tests not run as a full suite — spot-check after merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 11:52:42 +02:00
Mikael Hugo
66ff949c11 cherry-pick(security): harden project-controlled surfaces (PR #4755 partial)
Cherry-pick of gsd-build/gsd-2 65ca5aa2e — applies the security hardening
hunks that conflicted minimally:

- mcp-server/env-writer: validate writes against a strict allowlist
- web/api/files: enforce path containment via web/lib/secure-path
- vscode-extension: read binaryPath/autoStart only from trusted
  global/default scopes (resolveTrustedSfStartupConfig), avoiding
  workspace-controlled override (renamed Gsd → Sf for sf naming)
- New regression tests: mcp-client-security, vscode-startup-security,
  web-files-symlink

Skipped hunks (drifted): mcp-server/server.ts, mcp-client/index.ts,
mcp-server/README.md.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:37:07 +02:00