Commit graph

3629 commits

Author SHA1 Message Date
Mikael Hugo
7c487bb60e port(pi-mono): normalize Bedrock model names for inference profiles (refs ed4bc7308)
Pi-mono Tier 0 #5 — first sf-driven port. sf-from-source dispatched the
task in print mode and produced this fix autonomously.

Adds getModelMatchCandidates(modelId, modelName?) helper that normalizes
both inputs to lowercase and dash-separated form
(s.replace(/[\s_.:]+/g, "-")). Inference profile ARNs don't embed the
model name; the helper lets capability checks match against either the
inference profile ARN or the underlying model name.

Updated:
- supportsAdaptiveThinking — uses the helper; consolidates the
  opus-4.6/opus-4-6 dot-vs-dash variants.
- mapThinkingLevelToEffort — same pattern.
- supportsPromptCaching — same pattern (also from pi-mono PR #3527).
- streamSimpleBedrock and buildAdditionalModelRequestFields — pass
  model.name through to capability checks.

Type-check passes (cd packages/pi-ai && npx tsc --noEmit).

Co-Authored-By: sf v2.75.1 (session 911dd2de)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:14:17 +02:00
Mikael Hugo
a3c487c918 docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first
Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence):
9 items ranked security → bug-fixes → infra → features.

  Critical:
    1. HTML export escape (security)
    2. Empty tools array fix (provider compatibility)
    3. Anthropic SSE proxy event tolerance
    4. Long local-LLM SSE 5min timeout fix

  Infrastructure:
    5. Bedrock inference profile normalization
    6. Symlinked packages dedup
    7. ctx.ui.setWorkingVisible() extension API

  Features:
    8. Cloudflare Workers AI provider
    9. Azure Cognitive Services endpoint

Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace):

  Critical fixes (11):
    1-6.  bash race, security hardening, web_search injection narrowing,
          symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock
    7-10. agent_end transition fixes (4 commits)
    11.   claude-code-cli Always-Allow persistence

  Normal-value features (6):
    12. /gsd eval-review slim port (prompt + tool + template)
    13. Workflow state machine hardening (5 commits as unit)
    14. Proactive rate limiting (min_request_interval_ms)
    15. Per-call token telemetry (opt-in pi-coding-agent hooks)
    16. Worktree TUI commands
    17. Doctor check for orphan milestone directories

Skipped from each upstream is documented. All in BUILD_PLAN.md so sf
can work the list systematically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:04:31 +02:00
Mikael Hugo
bb1c68b7ab docs: drop OpenRouter-removal follow-up
OpenRouter is already neutered via the provider_model_allow allowlist
(see d38e5ea09 fix(schema): auto-coerce string → [string] for sf_* list
fields + provider_model_allow tests). The 248 model entries in
models.generated.ts are inert — no dispatch path reaches them.

Removing the data entries would be aesthetic cleanup with zero
behavioral effect. Not worth a Tier-1 follow-up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:58:33 +02:00
Mikael Hugo
310ce963ea docs: add session follow-ups to BUILD_PLAN
Six items surfaced during 2026-04-29 ports/refactors that didn't get
tracked anywhere:

- Tier 1: Remove OpenRouter (~248 model entries; user confirmed unused)
- Tier 1: Minimax search tests (deferred from initial port)
- Tier 2: Search provider registry refactor (rid of 9-file-per-provider)
- Tier 2: Product-audit phase machine wire-up (slim port shipped tool;
  phase dispatch not yet wired)
- Tier 2: Headless assistant-text preview (bunker pattern, deferred from
  headless UX commit)
- Tier 3: Pi-mono SDK sync cadence

Each entry has rationale + effort estimate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:56:55 +02:00
Mikael Hugo
a8cf2cd941 feat(workflow): add product-audit (slim port)
Milestone-end workflow that compares declared product intent (VISION.md,
RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and
emits structured gaps with severity. Soft gates — adds follow-up slices
but doesn't hard-block merge.

Slim port (4 new files + 1 registration) — extracts only the audit
feature itself, not bunker's parallel rewrite of dispatch/prompts/
benchmark-selector that came with it in commit 2aa785475.

Created:
- prompts/product-audit.md         — prompt verbatim, gsd_*→sf_* and .gsd→.sf
- tools/product-audit-tool.ts      — slim file-write implementation,
                                     atomicWriteAsync to .sf/active/{mid}/
                                     PRODUCT-AUDIT.{json,md}; no DB deps
- bootstrap/product-audit-tool.ts  — pi-coding-agent tool registration,
                                     TypeBox schema for sf_product_audit
- workflow-templates/product-audit.md — workflow template

Modified:
- bootstrap/register-extension.ts  — 2 lines: import + add to nonCriticalRegistrations
- workflow-templates/registry.json — registry entry
- package.json — version 2.75.0 → 2.75.1

Verdict logic (no-gaps | gaps-found | contract-underspecified) is the
load-bearing innovation: contract-underspecified forces the auditor to
flag unverifiable docs as a real gap rather than rubber-stamping
no-gaps when the product contract is silent.

Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase
machine is a follow-up; the prompt + tool + template stand alone.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:23 +02:00
Mikael Hugo
2eebeccb93 feat(search): add MiniMax web search provider
New search backend alongside tavily/brave/serper/exa/ollama. API key
resolution: MINIMAX_CODE_PLAN_KEY → MINIMAX_CODING_API_KEY →
MINIMAX_API_KEY (fallback order matches MiniMax's documented aliases).

Wired through every existing seam:
- type union: SearchProvider = 'tavily' | 'minimax' | 'brave' | 'ollama'
- VALID_PREFERENCES set + selection logic in provider.ts
- native-search routing (Anthropic native web_search delegates correctly)
- /search-provider CLI command (tab completion, select UI, parser)
- tool-search.ts: search execution path
- tool-llm-context.ts: prefetch / context-builder path
- preferences-types + preferences-validation
- configuration.md user docs
- extension-manifest description

Tests not added in this commit — the bunker reference tests don't match
our preferences/provider export shape (we have serper/exa/combosearch
that bunker doesn't). Tests for getMiniMaxSearchApiKey priority order,
resolveSearchProvider returning "minimax", /search-provider minimax CLI
behavior, no-key error messages, and executeMiniMaxSearch request shape
are TODO.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:55:04 +02:00
Mikael Hugo
ae0bbe32fc feat(providers): add xiaomi direct API (token-plan-{ams,sgp,cn}) — additive
Adds direct xiaomi token-plan API access alongside the existing
OpenRouter-routed xiaomi entries. ADDITIVE only — OpenRouter cleanup is
a separate follow-up.

Three new region providers:
- xiaomi-token-plan-ams (Amsterdam, default for plain `xiaomi`)
- xiaomi-token-plan-sgp (Singapore)
- xiaomi-token-plan-cn (China)

All use Anthropic Messages API. Env-var resolution: XIAOMI_API_KEY →
XIAOMI_TOKEN_PLAN_API_KEY → MIMO_API_KEY (in that fallback order).

Three xiaomi MiMo models registered under each direct provider:
- mimo-v2-flash (256k ctx, 64k output, text-only, reasoning)
- mimo-v2-omni (256k ctx, 128k output, text+image, reasoning)
- mimo-v2-pro (1M ctx, 128k output, text-only, reasoning)

Same model literals × 4 provider keys, different baseUrls per region.
Test count assertion bumped 22 → 26 providers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:54:43 +02:00
Mikael Hugo
dff0df5fdc fix(headless): suppress notification spam, categorize messages, distinguish phase vs status
Three small UX fixes for headless / autopilot logs:

1. Add `zz-notifications` to TUI_FOOTER_STATUS_KEYS — these are sticky
   notification dots from the interactive TUI footer; they have no
   meaning in headless and were spamming the log.

2. Categorize notification messages by prefix so headless output is
   scannable: [mcp] for MCP-client-ready, [search] for web search status,
   [parallel] for slice-parallel/subagent dispatch. Falls through to
   the existing important/non-important formatting for everything else.

3. Distinguish phase transitions from generic status updates: phase:/
   milestone:/slice:/task: prefixed keys get [phase]; everything else
   gets [status]. Previously both used [phase], which was misleading.

Patterns based on bunker commits 14ec4d97f / c15afb45f (which were the
research source) but written fresh against our existing
TUI_FOOTER_STATUS_KEYS structure rather than cherry-picked.

The assistant-text-preview commit (cf0274c63) is a separate, larger
refactor in headless.ts and is deferred to v3.1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:43:40 +02:00
Mikael Hugo
c41912ff55 fix(prompts): tell agents about Serena (repo-intelligence MCP) for code exploration
We have .serena/ configured (cache, memories, project.local.yml) but no
prompt mentioned Serena anywhere. Agents weren't using it for symbol
lookup or cross-file architecture mapping; they fell straight to rg/find.

Added a one-sentence Serena hint to the code-exploration step in:
- research-slice.md
- research-milestone.md
- plan-slice.md
- plan-milestone.md
- guided-research-slice.md

Phrased generically ("If a repo-intelligence MCP (e.g. Serena) is
configured...") so it degrades cleanly when Serena isn't set up.

Pattern based on bunker commit 4ba746888 but written fresh against our
post-rename prompt structure rather than cherry-picked.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:41:33 +02:00
Mikael Hugo
7a6169705a docs: lock in fork stance, reframe cherry-pick list as reference-only
After attempting cluster B (4 surgical agent-session fixes), even the
first commit conflicted because of structural namespace divergence
(gsd_*→sf_* rename, @sf-run/*→@singularity-forge/* rename, prior
pi-mono direct cherry-picks). The conflicts are real semantic
divergence, not noise.

Conclusion: sf is a fork; we do not periodically sync from
gsd-build/gsd-2. Pretending we still track upstream means weeks of
merge work for diminishing return.

BUILD_PLAN.md adds an explicit "Upstream stance" section documenting
the fork posture and the rationale for the three irreversible naming
choices.

UPSTREAM_CHERRY_PICK_CANDIDATES.md is reframed as a reference list,
not an action plan. The clusters and SHAs remain useful as an
intelligence source — port specific fixes by hand when one bites us;
do not run automated cherry-picks against the list.

Pi-mono SDK syncs continue separately — that path doesn't have the
same divergence problem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:57:44 +02:00
Mikael Hugo
a80beb83b5 docs: enumerate high-value upstream cherry-pick candidates
The origin↔upstream divergence is 4,589 commits. This file picks the
high-leverage subset (~70 commits across 16 topical clusters) worth
considering for cherry-pick. Recommended order at the bottom.

Each cluster lists candidate SHAs with one-line context and effort
estimates. Total estimated work if all clusters A-N are taken: ~10-15
hours plus conflict resolution. Cluster O (UnitContextManifest /
Composer rewrite, ~15 commits) is deferred — likely conflicts heavily
with our work and should be revisited during v3 schema reconciliation.

Cluster P (memories table cutover, 1 commit) is flagged as READ FIRST
because it's upstream's answer to what BUILD_PLAN calls Singularity
Memory integration; reading it may change the recommended integration
path.

This is a candidate list for human decision, not an action plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:53:46 +02:00
Mikael Hugo
b24f426f2b batch: snapshot of in-flight v2 work
This commit captures uncommitted modifications that accumulated in the
working tree across multiple in-progress workstreams. It is a snapshot
to clear the deck before sf v3 work begins; individual workstreams
should land separately on top of this.

Notable additions:
- trace-collector.ts, traces.ts, src/tests/trace-export.test.ts —
  trace export plumbing
- biome.json — Biome linter configuration
- .gitignore — exclude native/npm/**/*.node compiled binaries

The bulk of the diff is across src/resources/extensions/sf/ (301 files)
and src/resources/extensions/sf/tests/ (277 files), reflecting the
ongoing sf extension work. Specific feature commits should follow this
snapshot rather than being archaeology'd out of it.

The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary
was left out of the commit — it's now gitignored and built locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:42:31 +02:00
Mikael Hugo
31842885ea docs: add BUILD_PLAN.md — tiered cut of v3 NEW items
Of the 56 NEW items in SPEC.md, not all are worth building for v3.
This plan groups them by tier:

- Tier 1 ESSENTIAL (~5 weeks): Vault resolver, sm integration decision,
  schema reconciliation, config alignment.
- Tier 2 STRONG (~3-4 weeks): doc-sync, intent chapters, PhaseReview
  3-pass, turn_status marker, last_error cap, cost_micro_usd.
- Tier 3 NICE (v3.1+): persistent agents, inter-agent messaging,
  workflow content pinning, runs table, pending_retain.
- Tier 4 DEFER: SSH workers, HTTP API auth, trace_index, PhaseUAT —
  build when a deployment demands it.
- Tier 5 DROP: items from late adversarial-review iterations that
  don't earn their keep (workflow_pins separate table, snap_ columns,
  agent_capabilities separate index).

Includes a recommended ~6-8 week v3.0 schedule and four decision
points that should be settled before starting work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:33:07 +02:00
Mikael Hugo
57a1bc6505 docs: import sf v3 spec from singularity-crush, annotated for status
Imports SPEC.md (v1.0-draft) from singularity-ng/crush#docs/spec — the
forward-looking contract for sf v3. Annotated section-by-section and
item-by-item with implementation status against current sf:

- EXISTS — already implemented in sf, matches the spec
- PARTIAL — implemented but diverges from spec; needs alignment work
- NEW — not yet implemented

Conformance breakdown (123 items total):
- 37 EXISTS
- 30 PARTIAL
- 56 NEW

The NEW items concentrate in: persistent-agent inbox model (§17/§18),
Singularity Memory integration (§16/§24), SSH worker extension (§22),
several supervisor refinements (§9), and policy/operations details
(audit fields, trace metadata, version pinning) introduced during the
v0.x adversarial review iterations.

The PARTIAL items concentrate in: schema reconciliation (sf has 3
tables — milestones/slices/tasks — vs spec's single units table),
config schema alignment, runs-table unification with audit_events,
and several worker-attempt lifecycle details that exist in different
shapes today.

This is an informational import. Implementing v3 against this spec
is its own work; the next step is deciding which NEW items are
actually wanted vs deferred, and whether to migrate the 3-table
planning schema to the single-units shape or keep what sf has and
update the spec.

Spec source: https://github.com/singularity-ng/crush/blob/docs/spec/SPEC.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:15:02 +02:00
Mikael Hugo
6eaf5926ad sf snapshot: uncommitted changes after 248m inactivity 2026-04-28 21:10:17 +02:00
Mikael Hugo
d30d91bf2f sf snapshot: uncommitted changes after 41m inactivity 2026-04-28 17:01:26 +02:00
Mikael Hugo
5d3c204006 fix(git-merge): no auto-flip from approved to declined; cached approval is sticky
Codex-rescue output (a299c461 / bnr88iy59) — the 'Git merge approved once'
followed seconds later by 'Git merge declined by user' bug we hit on
M002 complete-milestone. Same gate, same agent run, opposite verdicts.

Single source of truth for the merge-gate state in guardrails/index.ts.
Approval is now sticky — re-asks return the cached approval until consumed
or explicitly revoked, never auto-flip to decline. Timeout converts to
pause+log instead of decline. Adds tests/safe-git-merge-gate.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 16:20:08 +02:00
Mikael Hugo
d38e5ea092 fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests
Two codex-rescue tasks landed together:

1. Auto-coerce JSON-schema validator: when a tool field declares
   {type:"array", items:{type:"string"}} and the model sends a single
   string, wrap it in [string] before validation instead of hard-rejecting.
   Fixes the recurring "keyDecisions: must be array" rejection on
   sf_complete_task that wasted retries.

2. Provider_model_allow filter (proper implementation with helpers):
   - resolveProviderModelAllowList / isProviderModelAllowed /
     filterModelsByProviderModelAllow helpers in preferences-models
   - Wired into model-registry and auto-model-selection
   - New tests/provider-model-allow.test.ts

Tools coerced: sf_complete_task, sf_complete_milestone, sf_plan_milestone,
sf_plan_slice, sf_replan_slice, sf_reassess_roadmap (key list fields).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 12:30:55 +02:00
Mikael Hugo
f98a1e360e batch: codex-rescue session output (multiple in-flight tasks)
Combined output of multiple parallel codex-rescue runs that produced
working-tree edits but didn't commit. Tasks contributing:

- prefs: per-provider model allow-list (provider_model_allow) — manual
- TUI scroll + unresponsive (a7884d1a / bt3fpn4y2)
- planningMeeting required (aa09e904 / br127l763)
- Logs UX 4-pack (a5c65314 / btcplhu7f)
- Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp)
- sf_task_complete atomic + retry (a7a079b4 / b20cy5owv)
- Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98)
- Per-role slice prompts (a94c3e1a)
- Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl)
- Schema sweep (ac994b1e / task-moifq7pu-83coqz)
- Flow audit (ad26ecfd / bttj4vrqm)

Typecheck passes. Tests not run as a full suite — spot-check after merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: OpenAI Codex <noreply@openai.com>
2026-04-28 11:52:42 +02:00
Mikael Hugo
66ff949c11 cherry-pick(security): harden project-controlled surfaces (PR #4755 partial)
Cherry-pick of gsd-build/gsd-2 65ca5aa2e — applies the security hardening
hunks that conflicted minimally:

- mcp-server/env-writer: validate writes against a strict allowlist
- web/api/files: enforce path containment via web/lib/secure-path
- vscode-extension: read binaryPath/autoStart only from trusted
  global/default scopes (resolveTrustedSfStartupConfig), avoiding
  workspace-controlled override (renamed Gsd → Sf for sf naming)
- New regression tests: mcp-client-security, vscode-startup-security,
  web-files-symlink

Skipped hunks (drifted): mcp-server/server.ts, mcp-client/index.ts,
mcp-server/README.md.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:37:07 +02:00
Mikael Hugo
bf727173e7 cherry-pick(file-lock): make file-lock actually lock and throw on contention
Cherry-pick of gsd-build/gsd-2 a09e01640 — withFileLockSync now actually
acquires a proper-lockfile (was previously a no-op when proper-lockfile
wasn't required) and throws on ELOCKED contention by default. Adds
onLocked: 'skip' option for best-effort callers that tolerate dropped
entries (audit, journal). Modernizes import style (createRequire/join
from imports rather than ad-hoc require). Path-renames preserved
(gsd-pi → sf-run).

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:28:36 +02:00
Mikael Hugo
22d4579690 cherry-pick(state): lock-wrapped appends for journal, audit, workflow-logger
Cherry-pick of gsd-build/gsd-2 53babec29 — lock-wrapped append half.
Wraps appends to .sf/journal/, .sf/audit/events.jsonl, and the
workflow-logger error log in withFileLockSync (onLocked: skip),
preserving best-effort semantics while preventing torn writes
under contention.

Companion to the atomic-write half landed in 3df56cb94. Path-renames
(gsdRoot→sfRoot, gsd-db→sf-db) preserved during conflict resolution.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:27:44 +02:00
Mikael Hugo
f1f4b840e1 cherry-pick(doctor): self-heal symlinked .sf staging to prevent silent data loss
Cherry-pick of gsd-build/gsd-2 9340f1e9b (#4423) — doctor self-heal
detection for symlinked staging directories that can cause silent
data loss. Skips native-git-bridge.ts and git-service test (drifted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:56 +02:00
Mikael Hugo
7fd4672e55 cherry-pick(auto): handle worktree context fallback + sanitize paused session paths
Cherry-pick of gsd-build/gsd-2 a4f78731f — handles worktree context fallback
and sanitizes paths in paused session resumption. Skips uok-plan-v2-wiring
test hunk (drifted in sf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:40 +02:00
Mikael Hugo
93402643f4 cherry-pick(sf-db): tolerate corrupt task arrays in milestone rows
Cherry-pick of gsd-build/gsd-2 851507913 (#4056) — defensive parsing
so a corrupt or non-array tasks blob in a milestone row doesn't crash
sf-db reads. Test hunk skipped (sf-db.test.ts has drifted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:25:21 +02:00
Mikael Hugo
3df56cb94f cherry-pick(state): atomic-writes for guided-flow-queue and reports
Cherry-pick of gsd-build/gsd-2 53babec29 (Jeremy <jeremy@fluxlabs.net>)
— atomic-write half only. Eliminates torn-write risk on PROJECT.md
queue sync and reports.json/HTML index regeneration by switching
writeFileSync → atomicWriteSync (tmp+rename).

The companion lock-wrapped-append changes (journal.ts, uok/audit.ts,
workflow-logger.ts) are deferred — they need proper-lockfile +
withFileLockSync helper introduced first.

Co-Authored-By: Jeremy <jeremy@fluxlabs.net>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:16:39 +02:00
Mikael Hugo
8e827147c9 feat(code-intelligence): add sift indexer backend alongside project-rag
Generalize the code-intelligence hook to support multiple indexer
backends, with sift (rupurt/sift) as a new option next to the existing
project-rag MCP server. Backend is selected via CodebaseMapPreferences.

- code-intelligence.ts: new abstraction + sift backend (detect, resolve,
  status, context-block contribution)
- preferences-types.ts: codebaseIndexer field (project-rag | sift | none)
- preferences-validation.ts: validate the new field
- bootstrap/system-context.ts, commands-codebase.ts: dispatch on backend
- tests/code-intelligence.test.ts: sift detection/resolution/status tests
  (19 pass, 0 fail)

project-rag path unchanged and continues to work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 05:05:26 +02:00
Mikael Hugo
0606983d97 feat(subagent): add background job manager and tests
SubagentBackgroundJobManager tracks long-running subagent jobs with
status, abort support, and TTL-based eviction of completed results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 04:18:17 +02:00
Mikael Hugo
efd5e14e0a feat: add FEATURES.md capability map and generator
Human-oriented documentation of SF capabilities, with a script that
keeps it in sync with workflow-tools.ts and extension manifests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 04:18:12 +02:00
Mikael Hugo
25797129e2 sf snapshot: pre-dispatch, uncommitted changes after 38m inactivity 2026-04-28 00:21:39 +02:00
Mikael Hugo
0d286b991b sf snapshot: pre-dispatch, uncommitted changes after 2902m inactivity 2026-04-27 23:42:51 +02:00
Mikael Hugo
260d50a823 docs: warn against Python for managed-resources hash; causes resync hang
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 23:20:15 +02:00
Mikael Hugo
f0da5b6d21 fix: bind getProviderAuthMode to registry instance to avoid undefined 'this'
Extracting a class method as a bare reference loses its 'this' context,
causing 'Cannot read properties of undefined' when minimax (or any
provider) triggers the flat-rate auth-mode lookup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 19:22:39 +02:00
Mikael Hugo
7be540480e docs: add CLAUDE.md with dev guide for build pipeline and test runner
Documents the dist-vs-source distinction that caused the memoriesSection
fix to not take effect, the c8 coverage runner process leak, and the
template variable maintenance contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:56:03 +02:00
Mikael Hugo
7289933909 fix: populate memoriesSection in execute-task prompt and fix stale dist
buildExecuteTaskPrompt was not passing memoriesSection to loadPrompt,
causing headless auto to fail with a template variable error. Also
updated plan-slice-prompt.test.ts to supply the four template variables
(memoriesSection, runtimeContext, phaseAnchorSection, gatesToClose) that
were missing from the test fixture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:46:55 +02:00
Mikael Hugo
a30a7692e3 fix: dist-redirect.mjs incorrectly rewrites .js→.ts for node_modules paths containing /src/
The resolver guarded on context.parentURL.includes('/src/') to identify
in-repo source files, but @google/gemini-cli-core installs to
node_modules/@google/gemini-cli-core/dist/src/ which also contains '/src/'.
Relative imports from that dist package (e.g. './config/config.js') were
incorrectly rewritten to './config/config.ts', causing ERR_MODULE_NOT_FOUND
on every test that transitively imports the google-gemini provider.

Fix: add !context.parentURL.includes('/node_modules/') guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:04:23 +02:00
Mikael Hugo
2e32c96fa0 Port gsd2 functional parity: turn-epoch, abandon-detect, reapplyThinking, exec chain, memory chain, onboarding-state
- auto/turn-epoch.ts: AsyncLocalStorage-backed stale-write dropping for timeout recovery
- journal.ts: isStaleWrite() guard drops superseded turn writes
- auto/run-unit.ts: wrap agent_end Promise.race in runWithTurnGeneration
- auto/session.ts: ThinkingLevelSnapshot type + autoModeStartThinkingLevel/originalThinkingLevel fields
- auto-model-selection.ts: reapplyThinkingLevel() called after every successful setModel()
- auto/phases.ts: pass autoModeStartThinkingLevel to selectAndApplyModel + hook override restore
- abandon-detect.ts: two-signal milestone abandon detection in rewrite-docs overrides
- auto-post-unit.ts: use detectAbandonMilestone + parkMilestone in rewrite-docs handler
- preferences-types.ts: ContextModeConfig + isContextModeEnabled
- exec-sandbox.ts: sandboxed bash/node/python subprocess with .sf/exec/ persistence
- exec-history.ts: read-side scan of .sf/exec/*.meta.json
- compaction-snapshot.ts: ≤2 KB markdown digest written before context compaction
- tools/exec-tool.ts: sf_exec MCP tool executor
- tools/exec-search-tool.ts: sf_exec_search MCP tool executor
- tools/resume-tool.ts: sf_resume MCP tool executor
- bootstrap/exec-tools.ts: registers sf_exec/sf_exec_search/sf_resume
- memory-relations.ts: knowledge-graph edges between memories (traverseGraph)
- tools/memory-tools.ts: capture_thought/memory_query/sf_graph executors
- bootstrap/memory-tools.ts: registers capture_thought/memory_query/sf_graph
- bootstrap/register-extension.ts: wire exec-tools + memory-tools into registration
- onboarding-state.ts: onboarding completion record at ~/.sf/agent/onboarding.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:58:39 +02:00
Mikael Hugo
5887ea3fd1 port gsd2: blocked-models gate, milestone-summary classifier, unsupported-model recovery
blocked-models.ts (new):
  Persistent per-project blocklist at .sf/runtime/blocked-models.json.
  loadBlockedModels / isModelBlocked / blockModel (file-lock-safe write).

milestone-summary-classifier.ts (new):
  classifyMilestoneSummaryContent → "success" | "failure" | "unknown".
  isTerminalMilestoneSummaryContent: failure summaries are NOT terminal —
  lets auto-mode re-enter a milestone after a failed recovery summary.

state.ts:
  Phase 1 (completeMilestoneIds) and Phase 2 (registry) now check
  isTerminalMilestoneSummaryContent before treating a SUMMARY as complete.
  A failure SUMMARY no longer prematurely parks a milestone.

error-classifier.ts:
  Add "unsupported-model" ErrorClass kind with regex detection
  (model + not-supported/unavailable/no-access + account/plan/tier).
  Checked before "permanent" so /account/i in PERMANENT_RE doesn't swallow it.

auto-model-selection.ts:
  Wire isModelBlocked() gate in selectAndApplyModel candidate loop:
  skips provider-rejected models and continues to fallbacks.

bootstrap/agent-end-recovery.ts:
  Handle cls.kind === "unsupported-model": blockModel(), try fallback chain
  skipping already-blocked models, pause if no usable fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:13:27 +02:00
Mikael Hugo
6cb6de4fd2 perf: parallelize I/O, add runtime cache, extend nix devenv
- unit-context-composer: resolve artifact keys in parallel (Promise.all)
- unit-runtime: add in-memory cache to avoid repeated disk reads per dispatch
- auto-timers: share 15s idle watchdog tick with context-pressure check
- auto-prompts: 1s TTL budget cache to coalesce repeated loadEffectiveSFPreferences calls
- native-git-bridge: extend nativeHasChanges TTL 10s→30s
- auto-dashboard: remove pulsing dot animation (CPU churn, no UX value)
- flake.nix: add nodePackages.typescript to dev shell

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:12:32 +02:00
Mikael Hugo
12aabd863e port gsd2 #4769: worktree telemetry, slice-cadence, canonical-root fix + /sf scan
Ports commit 7fb35ca58 from gsd2 (PR #4769) covering four issues:

#4761 — resolveCanonicalMilestoneRoot in worktree-manager.ts routes
validate-milestone through the live worktree path instead of stale
project-root state when a milestone worktree is active.

#4762 — auditOrphanedMilestoneBranches in auto-start.ts now surfaces
in-progress milestone branches with unmerged commits ahead of main
(previously only complete milestones were audited). Gated on
isClosedStatus so parked/other closed statuses are unaffected.

#4764 — worktree-telemetry.ts: typed emit helpers (emitWorktreeCreated,
emitWorktreeMerged, emitWorktreeOrphaned, emitAutoExit, emitWorktreeSync,
emitCanonicalRootRedirect, emitSliceMerged, emitMilestoneResquash) plus
summarizeWorktreeTelemetry aggregator and nearest-rank percentile().
Wired in: worktree-resolver.ts (create/merge events), auto-start.ts
(orphan telemetry), auto.ts stopAuto (auto-exit with normalized reason),
worktree-manager.ts (canonical-root-redirect). Surfaced in forensics.ts
via detectWorktreeOrphans and Worktree Telemetry sections.

#4765 — slice-cadence.ts: mergeSliceToMain squash-merges each slice's
commits onto main as soon as the slice passes validation (opt-in via
git.collapse_cadence: "slice"). resquashMilestoneOnMain collapses N
per-slice commits into one milestone commit at completion. Wired in
auto-post-unit.ts (slice merge after complete-slice with stopAuto on
conflict/error) and worktree-resolver.ts (resquash at mergeAndExit).
AutoSession.milestoneStartShas tracks the pre-first-slice SHA.
GitPreferences and preferences-validation.ts extended with
collapse_cadence and milestone_resquash fields.

Also ports /sf scan command: commands-scan.ts with parseScanArgs,
resolveScanDocuments, buildScanOutputPaths, and handleScan dispatching
a focused codebase assessment prompt to .sf/codebase/.

journal.ts: 9 new JournalEventType values for the telemetry events.
All changes are additive; default behavior (cadence="milestone") unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 09:03:56 +02:00
Mikael Hugo
2911d3b93d port gsd2: reassess-roadmap opt-in (ADR-003 §4) + prefer toolDefinition.label
reassess-roadmap: flip default from true → false. Most reassess units
conclude "roadmap is fine" burning a session for no change; the
plan-slice prompt now carries a JIT preamble at zero cost. (#4778)

tool-execution: always prefer toolDefinition.label when non-empty,
even when label === name — allows tools to display their canonical
name explicitly. (#4758)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:33:50 +02:00
Mikael Hugo
d4cdcb582d port gsd2 #3338: ecosystem plugin loader for .sf/extensions/
Adds support for project-local SF extension plugins dropped in
.sf/extensions/. Trust-gated (requires pi trust), symlink-escape safe.

- ecosystem/sf-extension-api.ts: SFExtensionAPI wrapper exposing
  getPhase() and getActiveUnit() to third-party handlers; updateSnapshot
  refreshes state before_agent_start so handlers see current phase/unit
- ecosystem/loader.ts: discovers .sf/extensions/*.js, loads them via
  dynamic import, dispatches factory(api) for each
- register-extension.ts: initializes ecosystemHandlers array, wires loader
- register-hooks.ts: before_agent_start refreshes snapshot then dispatches
  ecosystem handlers before returning SF system prompt
- types.ts: SFActiveUnit interface (milestoneId/sliceId/taskId + titles)
- workflow-logger.ts: "ecosystem" added to LogComponent union

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:27:55 +02:00
Mikael Hugo
6c36d62f35 port gsd2 #4961: stop using active-tool snapshot as model-policy gate
Fixes a bug where per-unit tool narrowing poisoned the policy gate for
subsequent units, causing "Model policy denied dispatch before prompt send"
errors on complete-slice and discuss-milestone (100% Win repro).

Four-part port from gsd2@817031b2a:
- ModelPolicyDispatchBlockedError class with per-model deny reasons
- TOOL_BASELINE WeakMap + clearToolBaseline/restoreToolBaseline lifecycle
- auto-model-selection: use getRequiredWorkflowToolsForAutoUnit as requiredTools
- auto/loop: catch ModelPolicyDispatchBlockedError as non-retryable (pause)
- auto.ts: wire clearToolBaseline at startAuto (fresh only) and stopAuto

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:15:04 +02:00
Mikael Hugo
4fdd8700a3 port gsd2 upstream features: scope classifier, composer v2, GPT-5.5, test timeout
- milestone-scope-classifier: add getMilestonePipelineVariant + milestoneRowToScopeInput
  wired into auto-dispatch trivial-skip for research/validation phases (#4781)
- auto-prompts: rename GSD→SF identifiers, add isSummaryCleanForSkip, prefs param
  on checkNeedsReassessment, buildExtractionStepsBlock from commands-extract-learnings
- unit-context-manifest + unit-context-composer: port v2 typed computed artifacts (#4924)
- skill-manifest: per-unit-type skill filter resolver (#4788, #4792)
- escalation: stub for ADR-011 mid-execution escalation (full port deferred)
- auto-start: extract decideSurvivorAction for testability (#4832)
- models: add gpt-5.5 + gpt-5.4-mini to cost table, router, and models.generated.ts
- types: EscalationArtifact, context_window_override, skip_clean_reassess,
  mid_execution_escalation, sketch_scope on SliceRow
- tool-execution: add visibleWidth import (was undefined)
- package.json: add --test-timeout=30000 to prevent parallel tests from freezing machine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:08:11 +02:00
Mikael Hugo
e2147c0694 sf snapshot: pre-dispatch, uncommitted changes after 43m inactivity 2026-04-25 06:34:49 +02:00
Mikael Hugo
7b6c9dd099 sf snapshot: pre-dispatch, uncommitted changes after 4703m inactivity 2026-04-25 05:51:29 +02:00
ace-pm
e625d20a59
fix: add self to flake outputs 2026-04-21 23:27:40 +02:00
ace-pm
c744bdf6c1
fix: atomic writes, parse radix, lossy json, silent worker spawn
8 fixes from 3rd-pass scan:

1. web/components/sf/tempCodeRunnerFile.tsx: remove orphan VS Code
   'Code Runner' artifact (850+ lines duplicated from shell-terminal.tsx).
   Unreferenced but compiled into tsc project.

2. sf/phase-anchor.ts: writePhaseAnchor used plain writeFileSync — a crash
   mid-write would corrupt the handoff checkpoint that readPhaseAnchor then
   silently returns null for, losing cross-phase context. Switched to
   atomicWriteSync (already used by sibling files).

3. sf/forensics.ts: same non-atomic writeFileSync on active-forensics.json
   marker. Race with a concurrent reader produces an empty object and the
   forensics session is lost. Switched to atomicWriteSync.

4. web/auto-dashboard-service.ts: paused-session.json existence was the
   intended signal but a corrupt body silently dropped the paused flag so
   the UI showed active. Now reports paused on file existence regardless
   of body integrity, and warns on corruption.

5. sf/visualizer-data.ts: doctor-history.jsonl parser did .map(JSON.parse)
   inside an outer catch. One corrupt line discarded 19 valid entries.
   Per-line try/catch preserves the valid rows.

6. sf/files.ts: three parseInt calls without radix (step, total_steps,
   totalSteps) — also missing || 0 fallback for NaN.

7. cli.ts: parseInt(process.versions.node) without radix. Split on '.' and
   use radix 10 explicitly.

8. sf/slice-parallel-orchestrator.ts: silent 'catch {}' around spawn()
   masked worker-spawn failures as 'no workers available'. Matches sibling
   parallel-orchestrator.ts pattern — now logs via logWarning.

Skipped from the scan (need a real lock mechanism, not safe as a one-line
fix):
- sf/auto-dispatch.ts:164 (UAT counter race)
- sf/captures.ts:107 (CAPTURES.md append race)

Deferred (low-value):
- preferences-models.ts, key-manager.ts, auto-timers.ts silent catches
- dead variable in visualizer-data.ts
- google-gemini-cli.ts maxTokens clamp interaction

tsc --noEmit green at root.
2026-04-21 02:13:10 +02:00
ace-pm
51b65fd490
fix: symlink extensions + silent catches masking real errors
Real bugs from 2nd-pass scan:

1. extension-registry.ts: discoverAllManifests skipped symlinked extension
   dirs because Dirent.isDirectory() returns false for symlinks. Dev-workflow
   symlinks under ~/.sf/agent/extensions/ were invisible to list/enable/
   disable/info. Matches the regression documented in
   symlink-extension-discovery.test.ts — the test inlines the correct logic,
   but this callsite still had the buggy form. Now accepts isDirectory() ||
   isSymbolicLink().

2. headless.ts SIGINT handler: client.stop() failures were double-silenced
   (inner .catch(()=>{}), outer try{}catch{}). Interactive mode logs stop
   errors to stderr. Restored head/headless parity — still fire-and-forget
   (exit code is forced via process.exit) but failures are observable.

3. openai-codex-responses.ts SSE parser: malformed data frames were silently
   dropped so broken streams looked identical to clean ones. Now debug-logs
   the parse error with the chunk context so broken streams are
   distinguishable in logs. Stream continues on bad chunk (one bad frame
   shouldn't kill the whole generation).

4. web/cleanup-service.ts generated script: bare 'catch {}' around four native
   git calls (nativeBranchList, nativeDetectMainBranch, nativeBranchListMerged,
   nativeForEachRef). A failed main-branch detection silently left mainBranch
   undefined-shaped, then the next native call operated on garbage. Now emits
   console.warn so failures surface in the subprocess log.

5. web/undo-service.ts generated script: git revert failure was silenced;
   when --no-commit failed, user saw commitsReverted=0 with no reason. Now
   logs the revert error before attempting --abort (abort itself remains
   best-effort silent).

False positives from the same scan (investigated and dismissed):
- auto-worktree.ts #2505: code uses ':(exclude).sf/milestones' pathspec +
  shelter-and-restore, which is a better fix than the 'drop --include-untracked'
  approach the test comment describes. Test comment is stale; source is correct.
- Lifecycle handler unhandled rejections across 5 extensions: extensions/runner.ts
  already try/catches handler invocations and routes to emitError. Wrapping the
  individual handlers would be redundant.
2026-04-21 02:01:41 +02:00
ace-pm
0f94341b43
fix(loader): fall back to src/resources when SF-WORKFLOW.md missing from dist
Build sometimes copies dist/resources/extensions/ without the top-level
markdown files (observed: SF-WORKFLOW.md absent in dist/resources/ while
extensions/ was present). existsSync(distRes) was true either way, so
SF_WORKFLOW_PATH pointed at a non-existent path and /sf failed with ENOENT.

Check for the specific file instead of the directory.
2026-04-21 01:39:18 +02:00