singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	c940ebc16f	sf: unify milestone discuss dispatch + todo.md seed injection Replace separate dispatchHeadlessBootstrap with one flow: - dispatchNewMilestoneDiscuss({ auto }) — auto=true uses headless prompt + rootFiles seed, no pendingAutoStartMap; auto=false uses discuss prompt with preparation, sets pendingAutoStartMap - bootstrapNewMilestone() — project setup + ID reservation, called directly from bootstrapAutoSession instead of the old wrapper - injectTodoContext() — reads and deletes todo.md/TODO.md/SPEC.md at project root, injects content as spec into any preamble; called identically in auto and interactive flows Removes dispatchHeadlessBootstrap entirely. auto-start.ts now calls the primitives directly. All three showWorkflowEntry new-milestone sites use dispatchNewMilestoneDiscuss({ auto: false }). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:04:12 +02:00
Mikael Hugo	67d25f95f2	sf: add gemini cli preflight token counting	2026-04-19 13:25:07 +02:00
Mikael Hugo	59806f8cc5	rip out antigravity from SF + pi-coding-agent UI/config layer Antigravity (Google's IDE sandbox product, different from Gemini CLI) is removed from: src/onboarding.ts — drop from LLM_PROVIDER_IDS + OAuth-flow picker src/pi-migration.ts — drop from LLM_PROVIDER_IDS migration list src/web/onboarding-service.ts — drop from web-UI provider list src/tests/integration/web-onboarding-contract.test.ts — update contract src/resources/extensions/sf/doctor-providers.ts — drop from CLI_AUTH_PROVIDERS src/resources/extensions/sf/key-manager.ts — drop UI listing src/resources/extensions/sf-usage-bar/index.ts — delete entire quota fetcher block (~200 lines) packages/pi-coding-agent/src/cli/args.ts — drop PI_AI_ANTIGRAVITY_VERSION doc packages/pi-coding-agent/src/utils/proxy-server.ts — drop from claude provider chain Reason: antigravity has no vendor-published core library we can embed (unlike @google/gemini-cli-core for the Gemini CLI). Continuing to hand-roll OAuth against daily-cloudcode-pa.sandbox.googleapis.com is exactly the pattern Google has started banning for third-party tools. Removing the code removes the ban risk. pi-ai provider code, OAuth util, and models.generated entries for google-antigravity are removed in follow-up commits (separated for reviewability — each layer verified independently). Build passes. Note: this is a breaking change for any user who had google-antigravity configured — they'll need to migrate to google-gemini-cli (OAuth), google (API key), or google-vertex. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:39:36 +02:00
Mikael Hugo	0f0dcbf8c7	benchmarks: add Gemini 2.5/3/3.1 Pro + Flash entries Gemini had zero benchmark entries in model-benchmarks.json despite being served by google-gemini-cli (OAuth provider, SF native), google (API key), google-vertex, google-antigravity, openrouter, etc. Every gemini-* model in the pi-ai catalog scored 0 in the benchmark selector — effectively excluded from auto-selection even when allow-listed. Published numbers from DeepMind model cards + Vellum LLM leaderboard + Vals AI: gemini-3-pro-preview: SWE-Verified 76.2, HLE 37.5, AIME25 95, GPQA-D 91.9, MMLU-Pro 81.0 gemini-3.1-pro-preview: SWE-Verified 78, HLE 41, AIME 97, GPQA-D 93, MMLU-Pro 83 (Feb 2026) gemini-3-flash-preview: estimated from Pro-vs-Flash delta gemini-2.5-pro: SWE-Verified 63.8, HLE 18.8, GPQA-D 84.0, MMLU-Pro 86 gemini-2.5-flash: estimated from Pro-vs-Flash delta Context windows reflect Gemini's 1M-2M token capability. LiveCodeBench Pro Elo (2439 for Gemini 3 Pro) isn't in the 0-100 percent schema — skipped rather than forced. Future: add arena_elo- style LCB Elo dimension to the schema if we start routing on it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:11:45 +02:00
Mikael Hugo	e413cf4a3f	preferences: add provider_preference for benchmark tie-breaking When two models score identically in the benchmark selector — typically the same underlying weights served by different endpoints — the previous alphabetical tiebreaker picked wrong. dr-repo example: zai/glm-5.1 score 84.7 opencode-go/glm-5.1 score 84.7 Both are the exact same GLM-5.1 weights. Alphabetical comparison made opencode-go win ("o" < "z") even though zai is the NATIVE provider. Fix: new `provider_preference` pref, an ordered list of providers. Listed providers rank in order, unlisted fall after alphabetically. Applied as the tie-breaker between score and alphabetical. Global default shipped in ~/.sf/preferences.md: kimi-coding, minimax, zai, mistral, ollama-cloud, opencode-go, opencode Native providers ranked before re-servers. Users can override per project. Verified: after the change, dr-repo picks zai/glm-5.1 as primary for execute-task and gate-evaluate (was opencode-go/glm-5.1), and kimi-coding/k2p5 stays primary for completion phases with its direct provider winning over opencode re-servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:09:42 +02:00
Mikael Hugo	345f9586dd	benchmark-selector: coverage-confidence multiplier + 12 regression tests The original "normalise by populated weight" was too aggressive: a model with 1 strong dimension (delta-fast: human_eval=92) outranked a model with 4 strong dimensions (beta-coder: swe_bench=85, lcb=90, he=95, ifeval=90) because both normalised to their own small average. Fix: multiply normalised score by a confidence factor tied to how much of the unit's profile the model actually populated. Confidence = populated_weight / total_profile_weight, blended 50/50 with a flat floor so sparse-but-strong specialists still rank when no generalist covers the profile: score = (weighted_sum / weight_total) * (0.5 + 0.5 * confidence) Net effect on dr-repo's auto-resolve: Before: After: plan-milestone glm-5.1 plan-milestone MiniMax-M2.5 research-slice codestral research-slice mistral-large-2411 execute-task mistral-large execute-task opencode-go/glm-5.1 validate-m magistral validate-m MiniMax-M2.5 subagent mistral-large subagent kimi-coding/k2p5 MiniMax's broad coverage (8 populated dimensions from the M2 README) now correctly outranks GLM-5.1's higher but narrower scores for reasoning-heavy units. Matches user intuition that "MiniMax is really powerful". Also fixes findBenchmarkKey to try "<modelId>-latest" for date-suffixed model variants — pi-ai catalogs "devstral-medium-2507" but benchmarks only have "devstral-medium-latest"; matcher now bridges that. 12 regression tests cover: - empty candidate pool - each profile (reasoning/coding/lightweight) picks right champion - swe_bench ↔ swe_bench_verified equivalence - models with all-null benchmarks score 0 but stay in fallbacks - sparse-strong beats dense-weak (confirms confidence multiplier doesn't over-penalise specialists) - provider diversification in fallback chain - deterministic tie-breaking - unknown unit types use default coding profile - date-suffixed model IDs match family-latest keys Audit: 41 of 85 allow-listed models in pi-ai catalog have benchmark data. 44 score 0 (mostly opencode Zen re-served models, ministral small variants, pixtral vision models, legacy open-mistral). Top picks for every dr-repo unit type DO have benchmark data — the gap is in the long tail of fallbacks, which never matter unless the primary and closer fallbacks all fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:58:10 +02:00
Mikael Hugo	0b8a1c246f	auto-benchmark model selection: pick best-scoring per unit type New module src/resources/extensions/sf/benchmark-selector.ts implements benchmark-driven model selection. When models.<unit> is not pinned, preferences-models.ts falls through to pick the highest-scoring candidate from allowed_providers × pi-ai's model catalog, ranked against a per-unit-type weight profile. Weight profiles per unit type: plan-milestone / plan-slice → agent-planning (swe_bench .25, lcb .20, hle .15, gpqa .15, mmlu_pro .15, aime .10) research-* → mixed (mmlu_pro, hle, human_eval, browse_comp, simple_qa, gpqa) execute-task → coding (swe_bench .35, swe_bench_v .25, lcb .20, human_eval .15) execution_simple / complete-* → fast+correct (human_eval .40, instruction_following .35, ruler .25) gate-evaluate → review (swe_bench .30, hle .25, gpqa .25, ifeval .20) validate-milestone → validation (hle .30, gpqa .25, mmlu_pro .25, swe_bench .20) Key design decisions: - Missing dimensions are dropped (normalised by populated weight), so a model with 2 strong populated scores isn't crushed by a peer with 5 mediocre ones. - swe_bench ↔ swe_bench_verified are fungible — some vendors publish one, some the other; treat as equivalent. - Provider diversification in fallbacks so one provider going 429 doesn't kill the whole chain. - Score ties broken by coverage, then lexical — deterministic. Also updates MiniMax-M2/M2.5/M2.7 benchmarks with real numbers from the M2 official README (DeepWiki sourced) and MiniMax-M2.5 card (minimax.io): swe_bench_verified 69.4→80.2, LCB 83, HLE 31.8 (w/ tools — more representative for agent work than no-tools 12.5), AIME25 78, GPQA-D 78, MMLU-Pro 82. Context windows bumped to weights-level: M2 400K, M2.5/M2.7 1M (endpoints may cap lower). Verified end-to-end: with dr-repo's allow-list (kimi-coding/minimax/zai/opencode-go/mistral) and models.* absent, resolveModelWithFallbacksForUnit() returns: plan-milestone → opencode-go/glm-5.1 (+3 fallbacks) research-slice → mistral/codestral-latest execute-task → mistral/mistral-large-latest execution_simple → kimi-coding/k2p5 gate-evaluate → opencode-go/glm-5.1 validate-milestone → mistral/magistral-medium-latest subagent → mistral/mistral-large-latest Users can still pin individual units (existing models.* behaviour unchanged) or rely fully on auto-selection by omitting them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:43:26 +02:00
Mikael Hugo	6450b37025	core + search + benchmarks: auth-error recovery, multi-provider search, M2.7-highspeed entry Four related improvements that landed in the working tree after the auto-hardening merge but hadn't been committed: 1. auth_error as a distinct error type (auth-storage + retry-handler). Previously invalid/expired API keys would retry the same failing credential until the retry budget exhausted. Now: - classifyErrorType() recognizes 401s, "invalid api key", "authentication error", "unauthorized" etc as "auth_error" - RetryHandler triggers cross-provider fallback on auth_error just like it does for rate_limit and quota_exhausted — switch providers rather than burning retries on a broken key Outcome: a stale OPENCODE_API_KEY in sops now fails over to kimi or minimax immediately instead of stalling the unit. 2. Multi-provider search-key detection (native-search.ts). The "Web search: Set BRAVE_API_KEY" warning fired whenever a non-Anthropic model lacked BRAVE_API_KEY, even when the user had TAVILY_API_KEY or OLLAMA_API_KEY available. Now: the warning suppresses if any of BRAVE/TAVILY/OLLAMA keys is present, and the warning text lists all three options. Matches the preferences- validation allow-list for search_provider. 3. MiniMax-M2.7-highspeed benchmark entry (model-benchmarks.json). Routes the fast-tier variant of M2.7 through the Bayesian blender with inherited RULER scores. Lets dynamic routing consider the highspeed model when speed matters more than peak quality. No regressions: the 41 pre-existing test failures in pi-coding-agent (FallbackResolver chain-membership + LSP integration) are unchanged relative to the prior commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:24:54 +02:00
Mikael Hugo	a4428ba1ff	key-manager: surface opencode-go in LLM provider list for onboarding opencode-go is already a first-class provider in pi-ai (models.generated.js registers 7 models under the opencode-go namespace: glm-5, glm-5.1, kimi-k2.5, mimo-v2-{omni,pro}, minimax-m2.{5,7}) and runs against https://opencode.ai/zen/go/v1 with OPENCODE_API_KEY auth. It was missing from key-manager's LLM provider registry, so the /sf config wizard and onboarding flows didn't prompt users to supply OPENCODE_API_KEY. Adding it here gives users a discoverable path to subscribe and surface the 7 opencode-go models in list-models. Research confirmed (DeepWiki sst/opencode + curl probes): - /zen/go/v1/chat/completions is the OpenAI-compatible endpoint - OPENCODE_API_KEY is the correct env var - No /models listing endpoint — hardcoding is correct (already done by the generate-models.ts pipeline) - Sister /zen/go/v1/messages serves Anthropic-compat minimax variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:22:48 +02:00
Mikael Hugo	58543fdae4	preferences: add allowed_providers hard allowlist + plug 6 merge gaps New feature: allowed_providers — hard allowlist of providers that auto-mode can dispatch to. When set, models from any other provider are invisible to selection BEFORE models.* resolution and dynamic routing run. This prevents routing from silently picking providers the user doesn't have keys for — the root cause of repeated "400 The requested model is not supported" pauses observed in dr-repo when routing picked gpt-5.2-codex despite no GPT being configured. Implementation is a single filter at the top of selectAndApplyModel: availableModels = rawAvailable.filter(m => allowed.includes(m.provider.toLowerCase())) If the allowlist rejects everything, throw with a clear message pointing at the pref (fail-closed — don't dispatch to whatever's left). While wiring this I found mergePreferences was silently dropping six more validated fields — same latent-bug class as service_tier: - allowed_providers (new) - flat_rate_providers - stale_commit_threshold_minutes - widget_mode - modelOverrides - safety_harness All added to the merge function. Now: if you set it in PREFERENCES, consumers see it. Verified end-to-end: loadEffectiveSFPreferences() reads allowed_providers from dr-repo's .sf/PREFERENCES.md correctly, and auto-mode model selection honors the filter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:12:33 +02:00
Mikael Hugo	9f0723a7be	preferences + mcp-client: resolve from main worktree and add global MCP config Two related fixes surfaced from a real sf headless auto run in dr-repo. 1. Project preferences now resolve from the MAIN worktree, not the current linked worktree. SF's auto-mode creates a git worktree per milestone (`.sf/worktrees/M003/`). The old code called `projectPreferencesPath()` which used `process.cwd()` — the milestone worktree — so a pref change on main (service_tier, dynamic_routing, model config) never reached an in-flight milestone until the branch merged main. Observed concretely when disabling dynamic_routing had no effect until we merged main into the milestone branch. New `projectPrefsRoot()` detects a linked worktree by reading `.git` (a FILE in worktrees, pointing to `/main/.git/worktrees/NAME`), follows the `commondir` pointer back to the main `.git` dir, and walks up one level. Falls back to cwd silently for non-worktree setups. 2. MCP server config now also loads from global paths (`~/.sf/mcp.json`, `~/.sf/agent/mcp.json`) in addition to the existing project-level (`.mcp.json`, `.sf/mcp.json`). First-hit wins, so project configs can still shadow or augment a globally- registered server by name. This lets the user register unauth'd servers like the DeepWiki remote MCP once and have every SF project pick it up without per-project `.mcp.json`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 08:53:27 +02:00
Mikael Hugo	879dc63a70	prompts: add DeepWiki as the preferred docs-lookup path All 9 research/planning/discuss prompts updated to put DeepWiki first in the docs-lookup order. Context7 becomes the fallback for package-registry-only libraries. Rationale: Context7 free tier is capped at 1000 req/month — a research-heavy auto loop can burn through that in a single session. DeepWiki has no cap and covers any GitHub-hosted library with AI-indexed answers, so it's strictly better as the default for the typical SF research path. Prompts touched: system.md, discuss.md, discuss-headless.md, plan-milestone.md, queue.md, research-milestone.md, research-slice.md, guided-discuss-milestone.md, guided-discuss-slice.md, guided-research-slice.md Each references the three DeepWiki tools — ask_question, read_wiki_structure, read_wiki_contents — and explicitly mentions the Context7 1000-req/month cap so models don't spend it wastefully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 08:47:57 +02:00
Mikael Hugo	3bea082f20	auto-dispatch: silence expected registry fallback on non-auto commands sf headless query and sf headless status call resolveDispatch() without going through auto-mode startup, so the rule-registry singleton is never initialized. The previous code caught getRegistry()'s init error and logged a warning on every call — noise on the normal path: [sf:dispatch] WARN: registry dispatch failed, falling back to inline rules: RuleRegistry not initialized — call initRegistry() or setRegistry() first. Now: hasRegistry() probe first. When unset, skip straight to the inline rule loop without warning (it's the intended behavior outside auto). When the registry IS set and evaluateDispatch() genuinely throws, log the warning so real bugs still surface. Adds hasRegistry() as a public helper for any other hot-path caller that wants to branch on init without try/catch overhead. Verified end-to-end: sf headless query and sf headless status in dr-repo now run clean, no false warning. All 25 rule-registry tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 08:33:29 +02:00
Mikael Hugo	56130c2e39	preferences: wire 6 more latent pref fields through validation Same class of bug as the service_tier fix: preference fields declared in SFPreferences type and consumed by feature code, but never copied into the validated output, so they silently become undefined when set in PREFERENCES.md. Found by diffing validated.<field> vs the interface declarations: - forensics_dedup (boolean) — /sf forensics issue de-dup opt-in - stale_commit_threshold_minutes (number) — doctor safety-commit cadence - widget_mode ("full"\|"small"\|"min"\|"off") — dashboard widget sizing - slice_parallel ({ enabled?, max_workers? }) — slice-level parallelism - modelOverrides (Record) — per-model capability patches - safety_harness ({ enabled?, evidence_collection?, ... }) — LLM safety Validation is kind-appropriate: primitives get type + range checks, nested objects get object-shape guards with pass-through for now. Consumer sites already treat missing fields as optional, so landing shallow validation first is safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 08:25:59 +02:00
Mikael Hugo	63e87e8e86	preferences: wire service_tier through validation validatePreferences() is a strict allow-list — it copies only explicitly handled fields from input to output. service_tier was in KNOWN_PREFERENCE_KEYS (no unknown-key warning) but was never copied into the validated output, so users setting service_tier: priority or flex in PREFERENCES.md silently got undefined. This was a latent bug from before today's work: the new "off" value hit it first because I verified end-to-end, but priority/flex had the same issue. /sf fast on writes "priority" via writeGlobalServiceTier — correctly — and then the next read drops it on the floor. Now: service_tier is validated against {priority, flex, off} and copied through. Invalid values raise an error rather than being silently lost. Verified: dr-repo's service_tier: "off" in .sf/PREFERENCES.md now loads correctly via loadEffectiveSFPreferences().preferences.service_tier. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 08:05:26 +02:00
Mikael Hugo	5957d5c2b6	sf-tui + sf-permissions: gate footer-indicator side-effects on ctx.hasUI Three TUI-only decorations were running their full session-lifecycle handlers even in headless mode, where there is no footer to render into. Most visibly, the emoji extension's AI auto-assign path made a real LLM call to pick a 🚀/✨/🎯 that nothing would ever see. - sf-tui/emoji.ts: session_start and agent_start handlers early-return when !ctx.hasUI. Commands stay registered so /emoji still works if someone runs it, but the lifecycle work (state loading, AI emoji selection, setStatus emission) is skipped. - sf-tui/color-band.ts: session_start and session_switch handlers early-return when !ctx.hasUI. Avoids unnecessary state-file writes and resize-listener attachment in headless runs. - sf-permissions/index.ts:setLevel: guards the setStatus("authority", …) call behind ctx.hasUI. The existing session_start path was already gated — this closes the permission-change code path. Headless stderr was already filtering these keys, so the user-visible output is unchanged. This eliminates the underlying RPC traffic and — more importantly — stops spending LLM tokens on decorative emoji selection in headless runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 07:59:36 +02:00
Mikael Hugo	e1461f45b8	service-tier: add "off" preference value to fully disable feature Adds an explicit disable state (service_tier: "off" in PREFERENCES.md) that short-circuits every service-tier surface: - No setStatus("sf-fast", …) footer events — RPC traffic stops, not just the stderr filter masking it. - No service_tier field ever injected into before_provider_request payloads, regardless of model. - /sf fast on and /sf fast flex refuse to write a tier while "off" is set, instructing the user to clear the preference first. - /sf fast status shows "(service_tier: \"off\" in preferences)" so the explicit disable is visible at a glance. Rationale: setups that never run gpt-5.4 (Claude / Kimi / MiniMax / GLM / Gemini-only shops) have no use for the feature. "off" lets them fully turn it off rather than relying on model-support gates to silence it. 6 regression tests added in service-tier.test.ts covering the new isServiceTierDisabled export, hook short-circuit ordering, and the /sf fast command refusal. 52 / 52 service-tier tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 07:31:14 +02:00
Mikael Hugo	867f6558dc	ollama: make extension opt-in via OLLAMA_HOST Previously the bundled Ollama extension probed http://localhost:11434 on every session_start, which was wasted work for users who never run Ollama locally. It also registered slash commands, loaded the ollama_manage tool, and (in interactive mode) set a "[phase] ollama" status indicator that leaked into headless stderr. Now the default export short-circuits immediately when OLLAMA_HOST is not set — no probe, no command registration, no tool loading, no status indicator. probeAndRegister also double-checks so any direct caller stays consistent. ollama-cloud is unaffected: set OLLAMA_HOST=https://ollama.com and OLLAMA_API_KEY=<key> and everything runs as before. Self-hosted local Ollama is likewise unaffected — set OLLAMA_HOST=http://localhost:11434 explicitly to re-enable the old behavior. 3 new regression tests cover the opt-in guard. All 138 ollama tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 05:53:45 +02:00
Mikael Hugo	55ee2cb5c7	subagent: add per-call model override (Phase 1 of skill dispatch) Adds an optional model param to SubagentParams, TaskItem, and ChainItem so callers can override the agent's default model at dispatch time. This is the primitive that ace-coder's Task() tool exposes via its `model` arg — SF's subagent tool previously ignored model at the tool level, picking it up only from the named agent's .md frontmatter. - SubagentParams.model applies to single mode, or as a batch-level default for tasks/chain steps that don't set their own. - TaskItem.model and ChainItem.model override per-task / per-step. - runSingleAgent and runSingleAgentInCmuxSplit gain a trailing modelOverride parameter that flows into buildSubagentProcessArgs. - buildSubagentProcessArgs uses modelOverride ?? agent.model when picking the --model arg for the child process. Side benefit: retroactively fixes the latent bug where reactive_execution.subagent_model was threaded into prompt instructions but ignored by the actual tool. 9 regression tests added in subagent/tests/model-override.test.ts. All 53 subagent-related tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 05:22:07 +02:00
Mikael Hugo	254fba36c0	Add 6 SF skills: pm-planning, codebase-analysis, architecture-planning, feature-gap-analysis, code-review, advisory-partner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 04:51:43 +02:00
Mikael Hugo	6fc286a888	Add skills system (feature-gap-analysis, code-review, advisory-partner, pm-planning, codebase-analysis, architecture-planning) and fix dispatch_model revert - Add 6 new skills under src/resources/extensions/sf/skills/ - Revert broken dispatch_model extension from auto-prompts.ts — the subagent tool has no model-override param; skills stay as pure text injection - Fix discuss-headless.md: advisory-partner section now correctly describes that independent review runs via gate-evaluate/validate-milestone (Q3/Q4, MV01-MV04) with the validation model, not inline self-review - Include pm-planning, codebase-analysis, architecture-planning, and feature-gap-analysis skill activations in discuss-headless Active Skills Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 04:51:29 +02:00
Mikael Hugo	9a04fef925	Fix Codex review issues: phase-timeout mutation race and missing backfill P1 (phase-timeout mutation race): withPhaseTimeout now stores the still-running phase promise in _danglingPhasePromise when a timeout fires. Each loop iteration drains that promise (with try/catch) before starting new work, preventing the timed-out phase from mutating state concurrently with the next iteration. P2 (verification_status backfill): Schema migration v17 now runs a backfill UPDATE after adding the new column, deriving verification_status from existing verification_evidence rows. Projects upgraded mid-slice will have correct all_pass/partial/all_fail values immediately rather than empty strings that bypass the prior-task guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 16:48:00 +02:00
Mikael Hugo	ca5890df2e	Auto-hardening: 10 structural fixes for reliable multi-day autonomous operation Implements all fixes from the auto-hardening audit plan: P1-A: Per-phase timeout watchdog — withPhaseTimeout() wraps preDispatch/dispatch/finalize; on timeout emits warning, increments consecutiveFinalizeTimeouts, continues loop. Configurable via preferences.auto_supervisor.phase_timeout_minutes (default: 10). P1-B: Verified already wired (MAX_COOLDOWN_RETRIES → stopAuto+break). No change needed. P1-C: Worker timeout in parallel orchestrator — kills workers running beyond parallel.worker_timeout_minutes (default: 120 min) in refreshWorkerStatuses(). P2-A: Memory injection into dispatch prompts — buildMemoriesBlock() appended to plan-milestone inlined[] context and added as memoriesSection in execute-task. P2-B: Memory extraction retry — one 2s-delayed retry in the catch block of extractMemoriesFromUnit(); second failure is silently swallowed (non-fatal). P3-A: Partial verification state in DB — verificationStatus ("all_pass"/"partial"/"all_fail") derived from verificationEvidence.exitCode array and stored in new tasks column. New dispatch rule blocks next task when prior task has all_fail status. P3-B: Gate omission rationale enforcement — minOmissionWords added to GateDefinition (Q3=20, Q5=15, Q6=10, Q7=15). Short rationale upgrades verdict "omitted" → "flag". P4-A: Doctor issues → reassess escalation — pre-dispatch health check in loop.ts detects issues referencing slice IDs and queues reassess-roadmap sidecar instead of pausing. P4-B: File overlap preemption — analyzeParallelEligibility() sets eligible:false when the overlapping milestone is currently running (not just eligible/queued). P5-A: Deferred requirement tracking — parseDeferredRequirements() added to files.ts; completing-milestone rule warns (via logWarning) when deferred reqs targeting the milestone were not validated before completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 16:26:25 +02:00
Mikael Hugo	50a70b35bd	fix(sf): auto-mode stuck loop on research dispatch (#4414 ) Cherry-pick of gsd-build/gsd-2@80ae39ccd adapted for sf/ paths: - auto-artifact-paths: Add PARALLEL-BLOCKER sentinel path for parallel-research unitId - auto-dispatch: Skip re-dispatching parallel-research if PARALLEL-BLOCKER exists - auto-recovery: Add parallel-research verification; clear path/parse caches after writeBlockerPlaceholder - doctor: Downgrade active_requirement_missing_owner to warning (noisy during normal planning) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 13:44:44 +02:00
Mikael Hugo	9af9c0712d	fix(sf): handle auto-mode limit errors with model fallback (#4373 ) Cherry-pick of gsd-build/gsd-2@0b7a05491 adapted for sf/ paths: - Expand RATE_LIMIT_RE to cover quota-window phrasing (hit your limit, usage limit, quota reached) - Rate-limit errors bypass transient-deferral early return so model fallback executes - Add setCurrentDispatchedModelId() to keep AUTO dashboard label in sync after fallback switch - 4 regression tests for classifier coverage and structural guards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 13:42:46 +02:00
Nils Reeh	3b23ef3d4b	fix(pi-ai): wire thinking:{type} field and extend adaptive-thinking model coverage (#4392 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 503e79070d198254661febad35a267ead487b7e1)	2026-04-18 13:38:30 +02:00
Mikael Hugo	25e6f0db05	Fix all 26 failing tests (22 rebrand artifacts + 4 RTK seam bugs) RTK seam tests: SF_RTK_PATH was set then immediately deleted in withFakeRtk due to copy-paste duplication from the GSD→SF rename — fake RTK binary was never injected, so all 5 seam tests ran the raw command instead of the rewritten one. Remaining 21 fixes from the GSD→SF rebrand: - initial-gsd-header-filter.test.ts: import renamed filterInitialSfHeader - dist-redirect.mjs: doubled scope prefix @singularity-forge/@singularity-forge/* → @singularity-forge/* (5 specifiers affected) - forensics-issue-routing.test.ts: regex used sf-build/sf-2, prompt says singularity-forge/sf-run — align regex to match the actual prompt - key-manager.test.ts: GROQ_API_KEY set in dev env made empty-key test report configured:true — isolate with save/delete/restore - create-gsd-extension-paths.test.ts: skill dir doesn't exist in this repo, skip both tests gracefully with t.skip() - sf-usage-bar/index.ts: replace execSync(`which ${cmd}`) with spawnSync to fix unescaped shell interpolation static analysis failure - sf-notify/index.ts: convert enum to const object — strip-only TS mode does not support enums Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 13:07:09 +02:00
Mikael Hugo	c1c2623707	Add type declaration files for learning extension Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 12:36:08 +02:00
ace-pm	f92ee8d64c	Rename @sf-run/* → @singularity-forge/* package scope - All 373 source files updated - Package.json scopes in all workspace packages - Loader workspace symlink dir updated - RpcClient import unified from pi-coding-agent (fixes type mismatch) - Scripts, configs, flake.nix updated - Workspace symlinks rebuilt	2026-04-15 22:56:33 +02:00
ace-pm	9d739dfa5d	Rename GSD→SF: complete rebrand from fork origin - All gsdDir/gsdRoot/gsdHome → sfDir/sfRootDir/sfHome - GSDWorkspace* → SFWorkspace* interfaces - bootstrapGsdProject → bootstrapProject - runGSDDoctor → runSFDoctor - GsdClient → SfClient, gsd-client.ts → sf-client.ts - .gsd/ → .sf/ in all tests, docs, docker, native, vscode - Auto-migration: headless detects .gsd/ → renames to .sf/ - Deleted gsd-phase-state.ts backward-compat re-export - Renamed bin/gsd-from-source → bin/sf-from-source - Updated mintlify docs, github workflows, docker configs	2026-04-15 18:33:47 +02:00
ace-pm	6e10d93d0d	Add input parameter to setVibeForTool function. Allows tool input to be passed to vibe setter for future context-aware customization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:18:54 +02:00
ace-pm	42dda2013b	Remove old working-vibes and prompt-history extensions. Consolidated into unified sf-tui extension. Removed old git-footer extension in favor of sf-tui footer implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:18:29 +02:00
ace-pm	2b27f2e567	Update vibes import to use sf-tui module. Redirect working-vibes imports from old location to new sf-tui/vibes.js. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:18:22 +02:00
ace-pm	09691fe2e8	Add SF-TUI extension main entry point. Integrates footer rendering, working vibes, prompt history stash, and git status tracking into unified TUI enhancement extension. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:18:15 +02:00
ace-pm	f878e9b4a1	Add stash utility for TUI prompt history. Provides shared stash management functions and overlay UI for browsing and selecting previous prompts. Supports 1-9 quick picking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:18:07 +02:00
ace-pm	ec39512960	Add footer renderer for TUI status display. Displays git branch status, dirty state, extension statuses, model info, cost, and context usage percentage in the footer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:17:24 +02:00
ace-pm	c18f67d278	Add working message vibes for TUI. Provides context-aware working status messages based on prompt keywords and tool names, with random emoji for visual interest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:17:04 +02:00
ace-pm	7e03021b25	Add git status utility for TUI. Provides GitStatus interface and refreshGitStatus function for displaying repository branch, dirty state, untracked files, and commit counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:16:47 +02:00
ace-pm	e501ffeefd	Add sf-tui shared utility functions. Provides rightAlign helper for text alignment in terminal UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:16:39 +02:00
ace-pm	acab81de11	fix(mcp-project-config): remove duplicate constant export - removed redundant self-assignment export Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-15 16:11:35 +02:00
ace-pm	b5caedf786	fix(paths): remove duplicate SF_ROOT_FILES export - removed redundant self-assignment export that served no purpose Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-15 16:11:25 +02:00
ace-pm	dfbe620c34	feat(bootstrap): add tool_call hook to set vibes for tools - registers hook to trigger vibe state based on tool name and input - enables working-vibes extension to respond to tool invocations Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-15 16:10:49 +02:00
ace-pm	5102fa217e	fix(rtk): remove duplicate constant declarations and logic checks - removed duplicate SF_RTK_DISABLED_ENV, SF_SKIP_RTK_INSTALL_ENV, SF_RTK_PATH_ENV exports - fixed isRtkEnabled() to check SF_RTK_DISABLED_ENV once instead of twice - fixed resolveAppRoot() duplicate env.SF_HOME check - fixed resolveRtkBinaryPath() duplicate SF_RTK_PATH_ENV lookup - fixed ensureRtkAvailable() duplicate env checks and error messages - fixed bootstrapRtk() duplicate process.env assignment Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-15 16:10:25 +02:00
ace-pm	bed1a20cf5	feat(extensions): disable genai-proxy auth, add prompt-history overlay - genai-proxy: disable full proxy implementation due to auth bootstrap limitations at package boundary; throw clear error instead - proxy-command: add try-catch error handling around startProxy - prompt-history: new extension with Ctrl+Alt+H (or Ctrl+Shift+H fallback) to navigate and insert previously-stashed prompts. Stash limited to 20 entries in ~/.sf/agent/prompt-history.json Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-15 16:09:23 +02:00
ace-pm	421fccd898	refactor: rebrand gsd_ tool names and references to sf_ namespace Updates workflow tool names, documentation references, and internal naming conventions across MCP server, CLI, tests, and web components to complete the singularity-forge rebrand from gsd to sf. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:51:38 +02:00
ace-pm	6b0ac484ba	refactor: update log prefixes and string values from gsd- to sf- namespace Updates channel prefixes, log messages, comments, and configuration values across daemon, mcp-server, and related packages to complete the rebrand from gsd to sf-run naming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:37:12 +02:00
ace-pm	b29c12d5e5	refactor(native): rename gsd_parser.rs to forge_parser.rs Final rebrand: rename remaining Rust source file to complete the gsd → forge transition. All parser references already use forge_parser after earlier commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:58:21 +02:00
ace-pm	35dc87ef53	chore: sync workspace state after rebrand - Rebrand commits already in history (gsd → forge) - Sync pre-existing doc, docker, and CI config updates - All rebrand artifacts verified in place: * Native crates: forge-engine, forge-ast, forge-grep * Log prefixes: [forge] across 22+ files * Binary: ~/bin/sf-run * Workspace scopes: @sf-run/, @singularity-forge/ * Nix flake: Rust toolchain ready System ready for: nix develop && bun run build:native Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:54:20 +02:00
ace-pm	d501ca7d6d	fix: clean up git state after directory restoration - Accept deletion of gsd-phase-state.ts (renamed to forge-phase-state.ts earlier) - Accept deletion of create-gsd-extension/ (renamed to create-forge-extension/ earlier) - These renames were part of the rebrand and are preserved in commit history Stabilize git state after restoration operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:34:53 +02:00
ace-pm	83feadb4e1	wip: rename gsd-parser dir + exports, fix native package.json - packages/native/src/gsd-parser → packages/native/src/forge-parser - Update packages/native/package.json exports: ./gsd-parser → ./forge-parser - Update packages/native/src/index.ts imports: ./gsd-parser → ./forge-parser Build in progress: native tsc output missing submodule dists (fd, text, image, etc). This is a pre-existing issue with the build system, not caused by rebrand. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:22:21 +02:00

1 2 3 4 5 ...

2194 commits