Commit graph

3555 commits

Author SHA1 Message Date
Mikael Hugo
0b8a1c246f auto-benchmark model selection: pick best-scoring per unit type
New module src/resources/extensions/sf/benchmark-selector.ts implements
benchmark-driven model selection. When models.<unit> is not pinned,
preferences-models.ts falls through to pick the highest-scoring
candidate from allowed_providers × pi-ai's model catalog, ranked
against a per-unit-type weight profile.

Weight profiles per unit type:
  plan-milestone / plan-slice  → agent-planning (swe_bench .25, lcb
                                  .20, hle .15, gpqa .15, mmlu_pro .15,
                                  aime .10)
  research-*                    → mixed (mmlu_pro, hle, human_eval,
                                  browse_comp, simple_qa, gpqa)
  execute-task                  → coding (swe_bench .35, swe_bench_v
                                  .25, lcb .20, human_eval .15)
  execution_simple / complete-* → fast+correct (human_eval .40,
                                  instruction_following .35, ruler .25)
  gate-evaluate                 → review (swe_bench .30, hle .25,
                                  gpqa .25, ifeval .20)
  validate-milestone            → validation (hle .30, gpqa .25,
                                  mmlu_pro .25, swe_bench .20)

Key design decisions:
  - Missing dimensions are dropped (normalised by populated weight),
    so a model with 2 strong populated scores isn't crushed by a peer
    with 5 mediocre ones.
  - swe_bench ↔ swe_bench_verified are fungible — some vendors publish
    one, some the other; treat as equivalent.
  - Provider diversification in fallbacks so one provider going 429
    doesn't kill the whole chain.
  - Score ties broken by coverage, then lexical — deterministic.

Also updates MiniMax-M2/M2.5/M2.7 benchmarks with real numbers from
the M2 official README (DeepWiki sourced) and MiniMax-M2.5 card
(minimax.io): swe_bench_verified 69.4→80.2, LCB 83, HLE 31.8 (w/
tools — more representative for agent work than no-tools 12.5),
AIME25 78, GPQA-D 78, MMLU-Pro 82. Context windows bumped to
weights-level: M2 400K, M2.5/M2.7 1M (endpoints may cap lower).

Verified end-to-end: with dr-repo's allow-list
(kimi-coding/minimax/zai/opencode-go/mistral) and models.* absent,
resolveModelWithFallbacksForUnit() returns:
  plan-milestone     → opencode-go/glm-5.1 (+3 fallbacks)
  research-slice     → mistral/codestral-latest
  execute-task       → mistral/mistral-large-latest
  execution_simple   → kimi-coding/k2p5
  gate-evaluate      → opencode-go/glm-5.1
  validate-milestone → mistral/magistral-medium-latest
  subagent           → mistral/mistral-large-latest

Users can still pin individual units (existing models.* behaviour
unchanged) or rely fully on auto-selection by omitting them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:43:26 +02:00
Mikael Hugo
6450b37025 core + search + benchmarks: auth-error recovery, multi-provider search, M2.7-highspeed entry
Four related improvements that landed in the working tree after the
auto-hardening merge but hadn't been committed:

1. auth_error as a distinct error type (auth-storage + retry-handler).
   Previously invalid/expired API keys would retry the same failing
   credential until the retry budget exhausted. Now:
     - classifyErrorType() recognizes 401s, "invalid api key",
       "authentication error", "unauthorized" etc as "auth_error"
     - RetryHandler triggers cross-provider fallback on auth_error just
       like it does for rate_limit and quota_exhausted — switch
       providers rather than burning retries on a broken key
   Outcome: a stale OPENCODE_API_KEY in sops now fails over to kimi or
   minimax immediately instead of stalling the unit.

2. Multi-provider search-key detection (native-search.ts).
   The "Web search: Set BRAVE_API_KEY" warning fired whenever a
   non-Anthropic model lacked BRAVE_API_KEY, even when the user had
   TAVILY_API_KEY or OLLAMA_API_KEY available. Now: the warning
   suppresses if any of BRAVE/TAVILY/OLLAMA keys is present, and the
   warning text lists all three options. Matches the preferences-
   validation allow-list for search_provider.

3. MiniMax-M2.7-highspeed benchmark entry (model-benchmarks.json).
   Routes the fast-tier variant of M2.7 through the Bayesian blender
   with inherited RULER scores. Lets dynamic routing consider the
   highspeed model when speed matters more than peak quality.

No regressions: the 41 pre-existing test failures in pi-coding-agent
(FallbackResolver chain-membership + LSP integration) are unchanged
relative to the prior commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:24:54 +02:00
Mikael Hugo
a4428ba1ff key-manager: surface opencode-go in LLM provider list for onboarding
opencode-go is already a first-class provider in pi-ai (models.generated.js
registers 7 models under the opencode-go namespace: glm-5, glm-5.1,
kimi-k2.5, mimo-v2-{omni,pro}, minimax-m2.{5,7}) and runs against
https://opencode.ai/zen/go/v1 with OPENCODE_API_KEY auth.

It was missing from key-manager's LLM provider registry, so the /sf
config wizard and onboarding flows didn't prompt users to supply
OPENCODE_API_KEY. Adding it here gives users a discoverable path to
subscribe and surface the 7 opencode-go models in list-models.

Research confirmed (DeepWiki sst/opencode + curl probes):
  - /zen/go/v1/chat/completions is the OpenAI-compatible endpoint
  - OPENCODE_API_KEY is the correct env var
  - No /models listing endpoint — hardcoding is correct (already done
    by the generate-models.ts pipeline)
  - Sister /zen/go/v1/messages serves Anthropic-compat minimax variants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:22:48 +02:00
Mikael Hugo
58543fdae4 preferences: add allowed_providers hard allowlist + plug 6 merge gaps
New feature: allowed_providers — hard allowlist of providers that
auto-mode can dispatch to. When set, models from any other provider
are invisible to selection BEFORE models.* resolution and dynamic
routing run. This prevents routing from silently picking providers
the user doesn't have keys for — the root cause of repeated
"400 The requested model is not supported" pauses observed in
dr-repo when routing picked gpt-5.2-codex despite no GPT being
configured.

Implementation is a single filter at the top of selectAndApplyModel:
  availableModels = rawAvailable.filter(m => allowed.includes(m.provider.toLowerCase()))
If the allowlist rejects everything, throw with a clear message
pointing at the pref (fail-closed — don't dispatch to whatever's
left).

While wiring this I found mergePreferences was silently dropping
six more validated fields — same latent-bug class as service_tier:

  - allowed_providers (new)       - flat_rate_providers
  - stale_commit_threshold_minutes - widget_mode
  - modelOverrides                - safety_harness

All added to the merge function. Now: if you set it in PREFERENCES,
consumers see it.

Verified end-to-end: loadEffectiveSFPreferences() reads
allowed_providers from dr-repo's .sf/PREFERENCES.md correctly, and
auto-mode model selection honors the filter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:12:33 +02:00
Mikael Hugo
9f0723a7be preferences + mcp-client: resolve from main worktree and add global MCP config
Two related fixes surfaced from a real sf headless auto run in dr-repo.

1. Project preferences now resolve from the MAIN worktree, not the
   current linked worktree. SF's auto-mode creates a git worktree per
   milestone (`.sf/worktrees/M003/`). The old code called
   `projectPreferencesPath()` which used `process.cwd()` — the
   milestone worktree — so a pref change on main (service_tier,
   dynamic_routing, model config) never reached an in-flight milestone
   until the branch merged main. Observed concretely when disabling
   dynamic_routing had no effect until we merged main into the
   milestone branch.

   New `projectPrefsRoot()` detects a linked worktree by reading
   `.git` (a FILE in worktrees, pointing to
   `/main/.git/worktrees/NAME`), follows the `commondir` pointer back
   to the main `.git` dir, and walks up one level. Falls back to cwd
   silently for non-worktree setups.

2. MCP server config now also loads from global paths
   (`~/.sf/mcp.json`, `~/.sf/agent/mcp.json`) in addition to the
   existing project-level (`.mcp.json`, `.sf/mcp.json`). First-hit
   wins, so project configs can still shadow or augment a globally-
   registered server by name. This lets the user register unauth'd
   servers like the DeepWiki remote MCP once and have every SF
   project pick it up without per-project `.mcp.json`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:53:27 +02:00
Mikael Hugo
879dc63a70 prompts: add DeepWiki as the preferred docs-lookup path
All 9 research/planning/discuss prompts updated to put DeepWiki
first in the docs-lookup order. Context7 becomes the fallback for
package-registry-only libraries.

Rationale: Context7 free tier is capped at 1000 req/month — a
research-heavy auto loop can burn through that in a single session.
DeepWiki has no cap and covers any GitHub-hosted library with
AI-indexed answers, so it's strictly better as the default for the
typical SF research path.

Prompts touched:
  system.md, discuss.md, discuss-headless.md, plan-milestone.md,
  queue.md, research-milestone.md, research-slice.md,
  guided-discuss-milestone.md, guided-discuss-slice.md,
  guided-research-slice.md

Each references the three DeepWiki tools — ask_question,
read_wiki_structure, read_wiki_contents — and explicitly mentions the
Context7 1000-req/month cap so models don't spend it wastefully.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:47:57 +02:00
Mikael Hugo
3bea082f20 auto-dispatch: silence expected registry fallback on non-auto commands
sf headless query and sf headless status call resolveDispatch() without
going through auto-mode startup, so the rule-registry singleton is
never initialized. The previous code caught getRegistry()'s init error
and logged a warning on every call — noise on the normal path:

  [sf:dispatch] WARN: registry dispatch failed, falling back to inline
  rules: RuleRegistry not initialized — call initRegistry() or
  setRegistry() first.

Now: hasRegistry() probe first. When unset, skip straight to the inline
rule loop without warning (it's the intended behavior outside auto).
When the registry IS set and evaluateDispatch() genuinely throws, log
the warning so real bugs still surface.

Adds hasRegistry() as a public helper for any other hot-path caller
that wants to branch on init without try/catch overhead.

Verified end-to-end: sf headless query and sf headless status in
dr-repo now run clean, no false warning. All 25 rule-registry tests
pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:33:29 +02:00
Mikael Hugo
56130c2e39 preferences: wire 6 more latent pref fields through validation
Same class of bug as the service_tier fix: preference fields declared
in SFPreferences type and consumed by feature code, but never copied
into the validated output, so they silently become undefined when set
in PREFERENCES.md.

Found by diffing validated.<field> vs the interface declarations:

- forensics_dedup (boolean) — /sf forensics issue de-dup opt-in
- stale_commit_threshold_minutes (number) — doctor safety-commit cadence
- widget_mode ("full"|"small"|"min"|"off") — dashboard widget sizing
- slice_parallel ({ enabled?, max_workers? }) — slice-level parallelism
- modelOverrides (Record) — per-model capability patches
- safety_harness ({ enabled?, evidence_collection?, ... }) — LLM safety

Validation is kind-appropriate: primitives get type + range checks,
nested objects get object-shape guards with pass-through for now.
Consumer sites already treat missing fields as optional, so landing
shallow validation first is safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:25:59 +02:00
Mikael Hugo
63e87e8e86 preferences: wire service_tier through validation
validatePreferences() is a strict allow-list — it copies only explicitly
handled fields from input to output. service_tier was in
KNOWN_PREFERENCE_KEYS (no unknown-key warning) but was never copied into
the validated output, so users setting service_tier: priority or flex in
PREFERENCES.md silently got undefined.

This was a latent bug from before today's work: the new "off" value hit
it first because I verified end-to-end, but priority/flex had the same
issue. /sf fast on writes "priority" via writeGlobalServiceTier —
correctly — and then the next read drops it on the floor.

Now: service_tier is validated against {priority, flex, off} and copied
through. Invalid values raise an error rather than being silently lost.

Verified: dr-repo's service_tier: "off" in .sf/PREFERENCES.md now loads
correctly via loadEffectiveSFPreferences().preferences.service_tier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:05:26 +02:00
Mikael Hugo
5957d5c2b6 sf-tui + sf-permissions: gate footer-indicator side-effects on ctx.hasUI
Three TUI-only decorations were running their full session-lifecycle
handlers even in headless mode, where there is no footer to render
into. Most visibly, the emoji extension's AI auto-assign path made a
real LLM call to pick a 🚀//🎯 that nothing would ever see.

- sf-tui/emoji.ts: session_start and agent_start handlers early-return
  when !ctx.hasUI. Commands stay registered so /emoji still works if
  someone runs it, but the lifecycle work (state loading, AI emoji
  selection, setStatus emission) is skipped.

- sf-tui/color-band.ts: session_start and session_switch handlers
  early-return when !ctx.hasUI. Avoids unnecessary state-file writes
  and resize-listener attachment in headless runs.

- sf-permissions/index.ts:setLevel: guards the setStatus("authority",
  …) call behind ctx.hasUI. The existing session_start path was
  already gated — this closes the permission-change code path.

Headless stderr was already filtering these keys, so the user-visible
output is unchanged. This eliminates the underlying RPC traffic and
— more importantly — stops spending LLM tokens on decorative emoji
selection in headless runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 07:59:36 +02:00
Mikael Hugo
e1461f45b8 service-tier: add "off" preference value to fully disable feature
Adds an explicit disable state (service_tier: "off" in PREFERENCES.md)
that short-circuits every service-tier surface:

- No setStatus("sf-fast", …) footer events — RPC traffic stops, not
  just the stderr filter masking it.
- No service_tier field ever injected into before_provider_request
  payloads, regardless of model.
- /sf fast on and /sf fast flex refuse to write a tier while "off" is
  set, instructing the user to clear the preference first.
- /sf fast status shows "(service_tier: \"off\" in preferences)" so
  the explicit disable is visible at a glance.

Rationale: setups that never run gpt-5.4 (Claude / Kimi / MiniMax /
GLM / Gemini-only shops) have no use for the feature. "off" lets them
fully turn it off rather than relying on model-support gates to
silence it.

6 regression tests added in service-tier.test.ts covering the new
isServiceTierDisabled export, hook short-circuit ordering, and the
/sf fast command refusal. 52 / 52 service-tier tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 07:31:14 +02:00
Mikael Hugo
867f6558dc ollama: make extension opt-in via OLLAMA_HOST
Previously the bundled Ollama extension probed http://localhost:11434
on every session_start, which was wasted work for users who never run
Ollama locally. It also registered slash commands, loaded the
ollama_manage tool, and (in interactive mode) set a "[phase] ollama"
status indicator that leaked into headless stderr.

Now the default export short-circuits immediately when OLLAMA_HOST is
not set — no probe, no command registration, no tool loading, no
status indicator. probeAndRegister also double-checks so any direct
caller stays consistent.

ollama-cloud is unaffected: set OLLAMA_HOST=https://ollama.com and
OLLAMA_API_KEY=<key> and everything runs as before. Self-hosted local
Ollama is likewise unaffected — set OLLAMA_HOST=http://localhost:11434
explicitly to re-enable the old behavior.

3 new regression tests cover the opt-in guard. All 138 ollama tests
pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 05:53:45 +02:00
Mikael Hugo
941eb4c830 headless: clean up sf headless auto stderr output
Three fixes to make the headless progress stream readable at a glance:

1. Filter TUI footer widget keys from setStatus — 0-emoji, 0-color-band,
   authority, ollama, sf-fast, and sf-auto are sticky indicators for the
   interactive TUI footer, not workflow phases. They no longer leak
   through as [phase] ollama / [phase] sf-fast noise.

2. Unify tag prefix column width at 11 chars via a new tag() helper in
   headless-ui.ts. All of [tool], [agent], [forge], [phase], [thinking],
   [cost], [text] now align on the same column, matching the existing
   [headless] and [thinking] widths.

3. Dedupe consecutive identical progress lines in headless.ts so a
   widget that re-emits the same setStatus on every LLM call prints
   once instead of flooding stderr. Two different lines still both show;
   only adjacent duplicates collapse.

Also tightens parsePhaseLabel so an unknown bare statusKey with no
message returns null rather than leaking the raw key — a defense in
depth if the footer-widget allowlist drifts behind a new extension.

Tests: 4 new cases in headless-progress.test.ts covering footer-key
suppression, bare-key suppression, workflow-phase passthrough, and
tag-alignment. 88/88 pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 05:47:02 +02:00
Mikael Hugo
55ee2cb5c7 subagent: add per-call model override (Phase 1 of skill dispatch)
Adds an optional model param to SubagentParams, TaskItem, and ChainItem so
callers can override the agent's default model at dispatch time. This is
the primitive that ace-coder's Task() tool exposes via its `model` arg —
SF's subagent tool previously ignored model at the tool level, picking it
up only from the named agent's .md frontmatter.

- SubagentParams.model applies to single mode, or as a batch-level default
  for tasks/chain steps that don't set their own.
- TaskItem.model and ChainItem.model override per-task / per-step.
- runSingleAgent and runSingleAgentInCmuxSplit gain a trailing
  modelOverride parameter that flows into buildSubagentProcessArgs.
- buildSubagentProcessArgs uses modelOverride ?? agent.model when picking
  the --model arg for the child process.

Side benefit: retroactively fixes the latent bug where
reactive_execution.subagent_model was threaded into prompt instructions
but ignored by the actual tool.

9 regression tests added in subagent/tests/model-override.test.ts.
All 53 subagent-related tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 05:22:07 +02:00
Mikael Hugo
254fba36c0 Add 6 SF skills: pm-planning, codebase-analysis, architecture-planning, feature-gap-analysis, code-review, advisory-partner
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 04:51:43 +02:00
Mikael Hugo
6fc286a888 Add skills system (feature-gap-analysis, code-review, advisory-partner, pm-planning, codebase-analysis, architecture-planning) and fix dispatch_model revert
- Add 6 new skills under src/resources/extensions/sf/skills/
- Revert broken dispatch_model extension from auto-prompts.ts — the subagent
  tool has no model-override param; skills stay as pure text injection
- Fix discuss-headless.md: advisory-partner section now correctly describes
  that independent review runs via gate-evaluate/validate-milestone (Q3/Q4,
  MV01-MV04) with the validation model, not inline self-review
- Include pm-planning, codebase-analysis, architecture-planning, and
  feature-gap-analysis skill activations in discuss-headless Active Skills

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 04:51:29 +02:00
Mikael Hugo
9724cb437a Merge auto-hardening: 10 structural fixes for reliable multi-day auto operation
Merges the auto-hardening branch which implements all audit-identified structural
holes in the SF auto-mode loop, memory, verification, health, and parallel systems.

See individual commits for detailed change descriptions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 16:48:38 +02:00
Mikael Hugo
9a04fef925 Fix Codex review issues: phase-timeout mutation race and missing backfill
P1 (phase-timeout mutation race): withPhaseTimeout now stores the still-running
phase promise in _danglingPhasePromise when a timeout fires. Each loop iteration
drains that promise (with try/catch) before starting new work, preventing the
timed-out phase from mutating state concurrently with the next iteration.

P2 (verification_status backfill): Schema migration v17 now runs a backfill UPDATE
after adding the new column, deriving verification_status from existing
verification_evidence rows. Projects upgraded mid-slice will have correct
all_pass/partial/all_fail values immediately rather than empty strings that
bypass the prior-task guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 16:48:00 +02:00
Mikael Hugo
ca5890df2e Auto-hardening: 10 structural fixes for reliable multi-day autonomous operation
Implements all fixes from the auto-hardening audit plan:

P1-A: Per-phase timeout watchdog — withPhaseTimeout() wraps preDispatch/dispatch/finalize;
      on timeout emits warning, increments consecutiveFinalizeTimeouts, continues loop.
      Configurable via preferences.auto_supervisor.phase_timeout_minutes (default: 10).

P1-B: Verified already wired (MAX_COOLDOWN_RETRIES → stopAuto+break). No change needed.

P1-C: Worker timeout in parallel orchestrator — kills workers running beyond
      parallel.worker_timeout_minutes (default: 120 min) in refreshWorkerStatuses().

P2-A: Memory injection into dispatch prompts — buildMemoriesBlock() appended to
      plan-milestone inlined[] context and added as memoriesSection in execute-task.

P2-B: Memory extraction retry — one 2s-delayed retry in the catch block of
      extractMemoriesFromUnit(); second failure is silently swallowed (non-fatal).

P3-A: Partial verification state in DB — verificationStatus ("all_pass"/"partial"/"all_fail")
      derived from verificationEvidence.exitCode array and stored in new tasks column.
      New dispatch rule blocks next task when prior task has all_fail status.

P3-B: Gate omission rationale enforcement — minOmissionWords added to GateDefinition
      (Q3=20, Q5=15, Q6=10, Q7=15). Short rationale upgrades verdict "omitted" → "flag".

P4-A: Doctor issues → reassess escalation — pre-dispatch health check in loop.ts detects
      issues referencing slice IDs and queues reassess-roadmap sidecar instead of pausing.

P4-B: File overlap preemption — analyzeParallelEligibility() sets eligible:false when
      the overlapping milestone is currently running (not just eligible/queued).

P5-A: Deferred requirement tracking — parseDeferredRequirements() added to files.ts;
      completing-milestone rule warns (via logWarning) when deferred reqs targeting
      the milestone were not validated before completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 16:26:25 +02:00
Mikael Hugo
4ee188e43e Merge process-lifecycle-fixes: clean shutdown and orphan process prevention 2026-04-18 14:38:58 +02:00
Mikael Hugo
3bb93b1612 Cherry-pick process lifecycle fixes for multi-day autonomous operation
- shell: add trackDetachedChildPid / untrackDetachedChildPid /
  killTrackedDetachedChildren (#9b7948c)
- bash: track/untrack detached child PIDs so they are killed on shutdown
- interactive-mode: register SIGTERM/SIGHUP handlers for clean shutdown
  (#5d440b0); kill tracked bash children on shutdown
- rpc-mode: register SIGTERM/SIGHUP handlers, refactor to forceShutdown()
  that deduplicates shutdown path (#5d440b0); kill tracked bash children
- print-mode: register SIGTERM/SIGHUP handlers for graceful exit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 14:38:55 +02:00
Mikael Hugo
54e1ba3804 Merge recovery-fixes: 4 critical upstream fixes for autonomous operation 2026-04-18 14:28:18 +02:00
Mikael Hugo
aff49e52aa Cherry-pick 4 critical recovery fixes from pi-mono upstream
- agent-loop: wrap afterToolCall in try/catch so hook throws don't crash
  parallel tool batches (#3084)
- retry-handler: add "connection lost" to retryable error patterns (#3317)
- rpc-mode: redirect console.log to stderr to protect JSON stdout (#2388)
- openai-completions: ignore null/non-object chunks in stream (#2466)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 14:28:15 +02:00
Mikael Hugo
28f0c91120 Merge tool bug fixes from pi-mono upstream
- Repeated compaction dropping kept messages (compaction.ts)
- Edit tool multi-edit support via edits[] (edit.ts / edit-diff.ts)
- Bash output persistence on line-count truncation (bash.ts / bash-executor.ts)
- Grep lineText extraction to avoid per-match file reads (grep.ts)
- afterToolCall isError override forwarding (agent-session.ts)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 14:18:57 +02:00
Mikael Hugo
f153521c24 Cherry-pick tool bug fixes from pi-mono upstream
- compaction: fix repeated compaction dropping kept messages (#2608)
  Re-summarize from previous compaction's firstKeptEntryId instead of
  prevCompactionIndex+1; use buildSessionContext for accurate tokensBefore

- edit: add multi-edit support via edits[] array
  Single call can update multiple disjoint regions in one file;
  applyEditsToNormalizedContent matches all edits against original content
  and applies in reverse order for stable offsets

- bash: persist full output when line-count truncation occurs (#2852)
  ensureTempFile now called on any truncation, not only byte overflow;
  prevents data loss when output exceeds line limit before byte threshold

- bash-executor: same fix for remote/operations-based execution
  ensureTempFile includes SF cleanup registration (registerTempCleanup,
  bashTempFiles tracking)

- grep: include lineText from rg JSON events to avoid per-match file reads
  Eliminates stall when context=0 on broad searches (#3148)

- agent-session: forward isError override from afterToolCall extension hook
  Allows extensions to change error status of tool results (#3051)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 14:18:52 +02:00
Mikael Hugo
1c4b289d89 Merge upstream cherry-picks from gsd-build/gsd-2 and badlogic/pi-mono
- ANTHROPIC_BASE_URL support for custom proxy endpoints (pi-ai anthropic provider)
- Codex unsupported model filtering (openai-codex OAuth provider)
- Bedrock thinking xhigh/max effort mapping for opus-4-7/opus-4-6
- Extended adaptive thinking to haiku-4-5, sonnet-4-6/4-7 (stream-adapter)
- hasLegacyOAuthCredential + removeLegacyOAuthCredential self-heal (auth-storage)
- Rate-limit regex extended with quota/reset patterns (error-classifier)
- setCurrentDispatchedModelId for rate-limit fallback tracking (auto)
- PARALLEL-BLOCKER sentinel in auto-artifact-paths / auto-dispatch / auto-recovery
- active_requirement_missing_owner downgraded to warning (doctor)
- claude-opus-4-7 model entries added to models.generated.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:57:21 +02:00
Mikael Hugo
07867b0ba6 rename: create-gsd-extension-paths.test.ts → create-sf-extension-paths.test.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:57:13 +02:00
Mikael Hugo
830328da95 feat(pi-ai): add claude-opus-4-7 model support (#4348)
Cherry-pick of gsd-build/gsd-2@8f8187e23 adapted for our single-file models.generated.ts:
- Amazon Bedrock: add anthropic.claude-opus-4-7, eu/global/us prefix variants
- Google Antigravity: add claude-opus-4-7-thinking
- OpenRouter: add anthropic/claude-opus-4.7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:46:30 +02:00
Mikael Hugo
50a70b35bd fix(sf): auto-mode stuck loop on research dispatch (#4414)
Cherry-pick of gsd-build/gsd-2@80ae39ccd adapted for sf/ paths:
- auto-artifact-paths: Add PARALLEL-BLOCKER sentinel path for parallel-research unitId
- auto-dispatch: Skip re-dispatching parallel-research if PARALLEL-BLOCKER exists
- auto-recovery: Add parallel-research verification; clear path/parse caches after writeBlockerPlaceholder
- doctor: Downgrade active_requirement_missing_owner to warning (noisy during normal planning)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:44:44 +02:00
Mikael Hugo
9af9c0712d fix(sf): handle auto-mode limit errors with model fallback (#4373)
Cherry-pick of gsd-build/gsd-2@0b7a05491 adapted for sf/ paths:
- Expand RATE_LIMIT_RE to cover quota-window phrasing (hit your limit, usage limit, quota reached)
- Rate-limit errors bypass transient-deferral early return so model fallback executes
- Add setCurrentDispatchedModelId() to keep AUTO dashboard label in sync after fallback switch
- 4 regression tests for classifier coverage and structural guards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:42:46 +02:00
Jeremy
b5e1beff8e fix(auth): self-heal stale Anthropic OAuth credential (#4399)
Anthropic OAuth was removed in v2.74.0 for TOS compliance (#3952). Users
who upgraded through that version still have type:"oauth" entries under
`anthropic` in auth.json which cannot resolve to a valid API key.

stale entry, so hasAuth("anthropic") kept reporting true and masked the
claude-code fallback path. Users had to hand-edit auth.json to recover.

Self-heal instead:

- AuthStorage.removeLegacyOAuthCredential(provider) strips only
  type:"oauth" entries and preserves any api_key credentials.
- sdk.ts getApiKey() calls it when the legacy-OAuth branch triggers,
  logs a one-line warning, and throws a message pointing the user at
  the "claude-code" provider when the `claude` binary is in PATH, or
  at ANTHROPIC_API_KEY otherwise.

Closes #4399

(cherry picked from commit b8ef6604617fda239a037cf5d5e6020b168d2e62)
2026-04-18 13:40:02 +02:00
Nils Reeh
3b23ef3d4b fix(pi-ai): wire thinking:{type} field and extend adaptive-thinking model coverage (#4392)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 503e79070d198254661febad35a267ead487b7e1)
2026-04-18 13:38:30 +02:00
Yeon Gil Kang
b73763d944 fix(pi-ai): hide unsupported ChatGPT codex oauth models
ChatGPT-backed Codex sign-in no longer exposes the removed 5.1/5.2 Codex variants. Filter those models from openai-codex OAuth so GSD stops surfacing selections that immediately fail while leaving API-key-backed OpenAI models available.

(cherry picked from commit 1aedba583916826fc5c6129037f61e9802010e46)
2026-04-18 13:35:45 +02:00
Nils Reeh
1914f31342 feat(pi-ai): support ANTHROPIC_BASE_URL env var for custom proxy endpoints (#4140)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit e94cf668e2fdf28537aead642b4062cd3a22a8d3)
2026-04-18 13:34:58 +02:00
Mikael Hugo
f1d483c304 Gitignore .direnv, .envrc, .serena dev-env artifacts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:07:17 +02:00
Mikael Hugo
25e6f0db05 Fix all 26 failing tests (22 rebrand artifacts + 4 RTK seam bugs)
RTK seam tests: SF_RTK_PATH was set then immediately deleted in withFakeRtk
due to copy-paste duplication from the GSD→SF rename — fake RTK binary was
never injected, so all 5 seam tests ran the raw command instead of the
rewritten one.

Remaining 21 fixes from the GSD→SF rebrand:
- initial-gsd-header-filter.test.ts: import renamed filterInitialSfHeader
- dist-redirect.mjs: doubled scope prefix @singularity-forge/@singularity-forge/*
  → @singularity-forge/* (5 specifiers affected)
- forensics-issue-routing.test.ts: regex used sf-build/sf-2, prompt says
  singularity-forge/sf-run — align regex to match the actual prompt
- key-manager.test.ts: GROQ_API_KEY set in dev env made empty-key test
  report configured:true — isolate with save/delete/restore
- create-gsd-extension-paths.test.ts: skill dir doesn't exist in this repo,
  skip both tests gracefully with t.skip()
- sf-usage-bar/index.ts: replace execSync(`which ${cmd}`) with spawnSync to
  fix unescaped shell interpolation static analysis failure
- sf-notify/index.ts: convert enum to const object — strip-only TS mode
  does not support enums

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:07:09 +02:00
Mikael Hugo
a0e469b18a Bump RTK from v0.33.1 to v0.37.0
Key improvements in v0.34–v0.37: binary hook engine with native cmd exec
and streaming (v0.37), subprocess memory leak / zombie process prevention,
POSIX spawn resource exhaustion fix, streaming for long-running commands,
git SSH signing preserved, and permission verdict system.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:53:41 +02:00
Mikael Hugo
78c5c3a78b Add proxy routing tests; fix two build errors
- 15 tests for ModelRegistry.getModelsForProxy covering family priority
  ordering, auth-ready promotion, overrides, and edge cases
- Fix StreamOptions cast in proxy-server.ts (lost during rebase conflict)
- Fix .ts import extension in args.test.ts (pre-existing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:40:03 +02:00
Mikael Hugo
c1c2623707 Add type declaration files for learning extension
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:36:08 +02:00
Mikael Hugo
30730dd25b Fix rebrand artifacts, add family-priority model routing to proxy server
- Update Dockerfile image name and package.json URLs to singularity-ng/singularity-foundry
- Add uv to nix develop shell in flake.nix
- Rename resolveGsdRoot → resolveSFRoot in src/cli.ts
- Add PROXY_FAMILY_PRIORITY routing table + sortByFamilyPriority to proxy-server.ts
- Fix duplicate scope key and simplify link-workspace-packages.cjs
- Remove duplicate conditions in postinstall.js
- Add ES2024 target/lib to tsconfig.extensions.json
- Delete obsolete GSD recovery scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 12:28:27 +02:00
github-actions[bot]
8f160677b7
release: v2.75.0
https://claude.ai/code/session_013BwmqG3NuwwZY3vsUb4Y9Y
2026-04-17 17:26:59 +00:00
mikkihugo
dc0db3868a
Add per-family proxy provider priority system with TUI and 429 fallback
- model-registry: export PROXY_FAMILY_PRIORITY and GLOBAL_PROVIDER_FALLBACK
  constants; add getModelsForProxy() returning candidates ordered by family
  priority then global fallback (opencode → opencode-go → openrouter →
  ollama-cloud); add getModel() convenience wrapper
- proxy-server: add priorityOverrides option; handleChat iterates all
  candidates in priority order and falls through to the next on 429
- settings-manager: add ProxySettings type with providerPriority override
  map; add getProxyProviderPriority() / setProxyFamilyProvider() accessors
- settings-selector: add ProxyPrioritySubmenu — a two-level TUI submenu
  (family → provider) that dynamically generates entries from
  PROXY_FAMILY_PRIORITY; wired in interactive-mode with full callback

Family defaults: MiniMax→minimax, GLM→zai, Kimi→kimi-coding,
MiMo→global-fallback, Gemini/Gemma→google-gemini-cli, Claude→anthropic,
GPT/o-series→openai

https://claude.ai/code/session_013BwmqG3NuwwZY3vsUb4Y9Y

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-17 19:17:50 +02:00
ace-pm
f92ee8d64c
Rename @sf-run/* → @singularity-forge/* package scope
- All 373 source files updated
- Package.json scopes in all workspace packages
- Loader workspace symlink dir updated
- RpcClient import unified from pi-coding-agent (fixes type mismatch)
- Scripts, configs, flake.nix updated
- Workspace symlinks rebuilt
2026-04-15 22:56:33 +02:00
ace-pm
e07204c782 gitignore native Rust build outputs (*.node, target/) 2026-04-15 18:36:37 +02:00
ace-pm
9d739dfa5d Rename GSD→SF: complete rebrand from fork origin
- All gsdDir/gsdRoot/gsdHome → sfDir/sfRootDir/sfHome
- GSDWorkspace* → SFWorkspace* interfaces
- bootstrapGsdProject → bootstrapProject
- runGSDDoctor → runSFDoctor
- GsdClient → SfClient, gsd-client.ts → sf-client.ts
- .gsd/ → .sf/ in all tests, docs, docker, native, vscode
- Auto-migration: headless detects .gsd/ → renames to .sf/
- Deleted gsd-phase-state.ts backward-compat re-export
- Renamed bin/gsd-from-source → bin/sf-from-source
- Updated mintlify docs, github workflows, docker configs
2026-04-15 18:33:47 +02:00
ace-pm
6e10d93d0d Add input parameter to setVibeForTool function.
Allows tool input to be passed to vibe setter for future context-aware
customization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:18:54 +02:00
ace-pm
42dda2013b Remove old working-vibes and prompt-history extensions.
Consolidated into unified sf-tui extension. Removed old git-footer
extension in favor of sf-tui footer implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:18:29 +02:00
ace-pm
2b27f2e567 Update vibes import to use sf-tui module.
Redirect working-vibes imports from old location to new sf-tui/vibes.js.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:18:22 +02:00
ace-pm
09691fe2e8 Add SF-TUI extension main entry point.
Integrates footer rendering, working vibes, prompt history stash,
and git status tracking into unified TUI enhancement extension.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:18:15 +02:00
ace-pm
f878e9b4a1 Add stash utility for TUI prompt history.
Provides shared stash management functions and overlay UI for browsing
and selecting previous prompts. Supports 1-9 quick picking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:18:07 +02:00