singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	22760e03d5	fix(sift): increase timeouts for vector retriever + scope-aware retriever for codebase_search Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Vector retriever was disabled everywhere because it appeared to hang. It was actually doing a first-time embedding index build for 57K files, which takes ~60-90 min. Re-enable vector by increasing timeouts and letting scope-aware retriever selection decide when vector is safe. Changes: - sift_search: retriever timeout 30s->300s, total 60s->600s - codebase_search: total timeout 120s->600s - warmup: retriever timeout 30s->300s, hard timeout 600s->3600s - codebase_search now uses chooseSiftRetrievers() instead of hardcoded bm25+phrase: repo-root -> bm25+phrase (fast), scoped subdirs -> vector - Comments updated to reflect "slow first build" not "hang" Tests: 178 files / 1845 tests, all pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 10:46:35 +02:00
Mikael Hugo	427324fb93	fix(plan): update existing milestone specs without stale params	2026-05-15 10:45:18 +02:00
Mikael Hugo	6e40b829f2	feat(sift): scope-aware retriever selection — vector for scoped, bm25 for repo-root Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Commit `1a98d8f9a` hardcoded --retrievers bm25,phrase across all sift calls to work around the full-repo vector inference hang. But vector retrieval works fine on scoped subdirectory queries (empirically: ~30s on src/resources/extensions/sf/uok with real semantic scoring). The hang is the full-repo indexing scope, not the inference path. This commit replaces the universal bm25 restriction with a scope-aware selector chooseSiftRetrievers(scopePath, projectRoot): - scopePath resolves to repo root → bm25+phrase, no rerank (safe) - scopePath resolves to anything else → bm25+phrase+vector, rerank enabled (semantic ranking unlocked) ensureSiftIndexWarmup behavior unchanged (scope is "." → repo-root → bm25+phrase). buildSiftArgs in the codebase_search tool now defaults to vector when the caller passes a scoped path; explicit retrievers overrides still win. Unlocks the high-leverage uses described earlier this session (memory ranking, plan/research context pre-fetch) for free — those always scope to a sub-tree. Tests: 9 new in sift-retriever-scope.test.mjs cover the dispatch matrix (repo-root variants get bm25, subdir variants get vector, explicit override wins, regression guard for warmup default). Full suite: 178 files / 1844 tests, no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 10:25:22 +02:00
Mikael Hugo	d90ac1fd69	fix(codebase_search): disable vector retriever to prevent hang Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The vector retriever in sift hangs indefinitely during embedding model inference, causing all codebase_search calls to timeout. Apply the same fix as sift_search: restrict retrievers to bm25+phrase and disable ML reranking. - buildCodebaseSearchArgs: add --retrievers bm25,phrase --reranking none - Update tool description from (BM25 + Vector) to (BM25 + phrase) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 10:13:31 +02:00
Mikael Hugo	1a98d8f9af	fix(sift): disable vector retriever + ML reranking to prevent hang Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The sentence-transformers/all-MiniLM-L6-v2 embedding model inference hangs indefinitely during sift search, causing: - Warmup to never complete (TTL expired 62+ min ago) - All page-index-hybrid searches to timeout - The search cache to become stale Fix: Restrict warmup and search to bm25+phrase retrievers with no ML reranking. This gives fast lexical results while avoiding the hanging embedding inference path. Also expose --retrievers and --reranking params in sift_search tool so callers can override per-query if needed. Closes #vector-hang-fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 09:45:49 +02:00
Mikael Hugo	ec65b4d881	feat(planning-state): DB-first VALIDATION.md migration (proposal MVP) Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Implements Phase 1 of docs/dev/proposals/db-first-planning-state.md (commit `f3571475d`). VALIDATION.md is now a render target; DB is canonical. Three read sites switched to DB: - tools/complete-milestone.js: getMilestoneValidationAssessment(id)?.status replaces readFile + extractVerdict (lines 126-137 → 126-140) - workspace-index.js: same swap in the indexWorkspace loop (was resolveMilestoneFile → loadFile → extractVerdict per milestone) - state-shared.js:readMilestoneValidationVerdict was already DB-first (prefers DB, file fallback only when no DB) — no change needed Write path regenerates: - tools/validate-milestone.js:renderValidationMarkdown now prepends <!-- generated from .sf/sf.db — do not edit directly; use the validate_milestone tool --> so the file is unambiguously a projection - verdict-parser.js:extractVerdict strips the comment header before frontmatter parsing so legacy readers (reflection.js, auto-prompts.js) still work on generated files Doctor check retired (clean delete): - doctor-engine-checks.js: db_projection_validation_drift detector removed entirely. Drift is structurally impossible once the write path always regenerates from DB. Comment block explains the removal. Tests: - New: db-first-validation.test.mjs — 6 tests covering regeneration, three read-site overrides, hand-edit override, doctor non-emission - Updated: doctor-db-projection-drift.test.mjs now asserts the check is NOT emitted (was previously asserting it WAS) Full suite: 469 passed, 0 failed, 3 skipped. No regressions. Closes the same class as the self-feedback DB/JSONL divergence pain — the M001-6377a4-VALIDATION.md doctor warning that's been firing repeatedly this session is gone by construction. Other planning artifacts (CONTEXT.md, ROADMAP.md, SUMMARY.md) follow in later phases per the proposal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:35:28 +02:00
Mikael Hugo	7dbf8ad430	feat(model-policy): wire lineage-diverse-from-worker into selector Round 8's `e7cf16882` declared the adversary role and the lineage-diverse-from-worker constraint but left actual filtering as a TODO in selectAndApplyModel. This wires the filter end-to-end. selectAndApplyModel now accepts (role, workerModelId) trailing params: - role: from modelRoleForUnitType(unitType) (extended to recognize "adversary"/"challenge"/"red-team" unit types as the adversary role) - workerModelId: explicit caller-supplied override, else falls back to _lastWorkerModelId (process-local cache populated whenever a worker- role dispatch resolves a model) When role is adversary or reviewer AND the role-policy includes lineage-diverse-from-worker, applyLineageDiverseFilter strips candidates that share root vendor with the worker model (via isSameRootVendor from model-role-policy.js). If filtering would leave zero candidates, a warning is logged and the unfiltered set is used (better a same-vendor reviewer than no reviewer). phases-unit.js threads modelRoleForUnitType(unitType) into selectAndApplyModel — the only producer site that needed the role parameter. Tests: 13 new (7 pure unit on applyLineageDiverseFilter — vendor mapping matrix + edge cases; 6 integration on selectAndApplyModel + modelRoleForUnitType wiring). All 37 tests in the affected files pass, no regressions. Concern: if the per-unit model config (from disk prefs) maps exclusively to the worker's vendor and has no fallback candidates, returns appliedModel: null — operator-configurable. Documented in tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:24:50 +02:00
Mikael Hugo	f3454de58a	fix(triage): --run routes through runTriageApply{dryRun:true} via SF router Closes sf-mp5khix3-9beona architecture-defect:triage-run-bypasses-sf-routing. The legacy `runTriage` in self-feedback-drain.js hardcoded DEFAULT_TRIAGE_MODEL="google-gemini-cli/gemini-3-pro-preview" and dispatched via @singularity-forge/ai completeSimple (text-only, no tools). The result: an autonomous triage path that produced a markdown decision matrix operators had to manually apply via resolve_issue. Now `--run` goes through runTriageApply with a new `dryRun: true` option that: - uses the same Phase 1/2 pipeline as --apply (triage-decider + review) - pre-resolves the model via SF's router (rankTriageModelsViaRouter), no hardcoded model - skips Phase 3 applyTriagePlan (read-only by design) - uses permissionProfile="low" and relaxes the trusted-source + custom-runner guards for the inspection path - prefixes flowId with "triage-run-" for clean trace separation Legacy runTriage kept as @deprecated (still exercised by self-feedback-drain.test.mjs unit tests that target completeSimple dispatch directly). Tests: 6 new in headless-triage-run-routing.test.ts covering dryRun short-circuit, no ledger mutations, guard relaxation, router not hardcoded, disagreement surfaces deciderOutput. Full triage suite: 35 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:20:43 +02:00
Mikael Hugo	a5dd5db354	fix(self-feedback): align report kinds and isolate watchdog tests	2026-05-15 09:19:27 +02:00
Mikael Hugo	ff31258629	chore: capture autonomous in-flight self-improvements Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Snapshot uncommitted work autonomous made in this session: - run-unit.js +54: enrich runUnitViaSwarm with completedItems / remainingItems / verificationEvidence pass-through from worker checkpoint args - self-feedback.js +10 - 2 test files updated to match the new shape All 72 affected tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 09:03:42 +02:00
Mikael Hugo	d57cd84d9a	fix(auto): make halt watchdog observable	2026-05-15 08:09:02 +02:00
Mikael Hugo	f9c147a08b	fix(swarm): ignore heartbeats for silent worker timeout	2026-05-15 08:00:35 +02:00
Mikael Hugo	e464a1bd6e	fix(swarm): bound silent worker responses	2026-05-15 07:35:31 +02:00
Mikael Hugo	81425230f5	fix(headless): do not restart graceful child exits	2026-05-15 07:25:06 +02:00
Mikael Hugo	9ba9b55f7a	fix(uok): import memory extractor from closeout	2026-05-15 07:12:10 +02:00
Mikael Hugo	c5850c8039	fix(verify): ignore stale broad cargo preferences	2026-05-15 07:06:17 +02:00
Mikael Hugo	d1ca3d035c	fix(auto): count only unproductive runaway iterations	2026-05-15 06:55:05 +02:00
Mikael Hugo	5faa789f52	fix: ensure shared/tui.js stub is tracked for build/test stability Prevents ERR_MODULE_NOT_FOUND and unblocks builds/tests.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 06:48:49 +02:00
Copilot	cf9203aee0	feat(swarm): forward parent permission profile to in-process worker sessions Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details In-process swarm workers get a fresh headless AgentSession whose permission extension defaults to read-only minimal. This blocks normal autonomous edits (e.g., write_file, edit) even when the parent session runs at normal or trusted level. - run-unit.js: add legacyPermissionLevelForProfile mapping and include executorPermissionLevel in the dispatch envelope. - swarm-dispatch.js: forward executorPermissionLevel from envelope to runAgentTurn as permissionLevel. - agent-runner.js: accept permissionLevel option and pass it to runSubagent config. - subagent-runner.ts: add permissionLevel to SubagentConfig; when set, temporarily set SF_PERMISSION_LEVEL env and run extension lifecycle so the permission extension reads the level before tool hooks execute. - Tests for envelope field, dispatch forwarding, and run-unit integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 06:38:42 +02:00
Mikael Hugo	f3571475d5	docs: DB-first planning state migration proposal Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Design doc for moving SF's milestone planning state from markdown-as-source-of-truth to DB-as-source-of-truth, with markdown becoming a render target. 463 lines, ~4500 words. Includes: - Survey of all markdown artifacts under .sf/milestones/M/ and who writes/reads each today (drift authoritative-ness is ambiguous in most cases) - MVP picks -VALIDATION.md as first artifact to migrate — three read-site fixes, no schema change, the doctor's db_projection_validation_drift check retires immediately - Hybrid editing UX (option c): CONTEXT-DRAFT and in-progress PLAN stay LLM-writable markdown; tool-call-bounded artifacts (validate_milestone, complete_slice, etc.) become DB-first with generated <!-- generated --> headers - 5-phase rollout plan - Open question flagged: git atomicity for milestone-level syncMilestoneLevelFiles calls — needs explicit tracing before Phase 4/5 No source-code changes. Implementation comes later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:35:02 +02:00
Mikael Hugo	19e33f7239	feat(subagent): SF_SUBAGENT_VIA_SWARM=1 routes /delegate via swarm dispatch Add runSingleAgentViaSwarm as an opt-in path in subagent/index.js. When SF_SUBAGENT_VIA_SWARM=1 (or =true), /delegate, /rubber-duck, /ask, /share, /sidekicks dispatch through swarmDispatchAndWait instead of calling runSubagent directly. This consolidates the subagent extension onto the same dispatch path autonomous unit work uses (Round 4's runUnitViaSwarm). Gains memory inheritance from MessageBus, durable bus audit trail, and the same event-streaming + onEvent plumbing built up through Rounds 2-7. Default (flag unset) is byte-identical to today — no regression in the in-process runSubagent path; existing TUI live update panel still works via the same processSubagentEventLine adapter. Tests: 9 passing in subagent-via-swarm.test.mjs covering: - flag unset → existing path, swarmDispatchAndWait not called - flag=1 → swarmDispatchAndWait called with composed prompt and tools - result shape parity with existing path - onEvent forwards through processSubagentEventLine Confirms end-to-end tool registration works in the worker session: test output shows "tool count after bindExtensions: 3 (read, bash, Skill)" — Round 7's bindExtensions + _refreshToolRegistry wiring is live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:35:02 +02:00
Mikael Hugo	1478579069	docs: AgentRuntime unification proposal Design doc for collapsing the five parallel agent-dispatch sites (defaultAgentRunner, runHeadlessPrompt, runSingleAgent, runUnitViaSwarm, slice-parallel-orchestrator) onto one runtime with three orthogonal axes — persistence, isolation, routing. 590 lines, ~5200 words. Includes: - Problem statement with five concrete pain points from this session's swarm convergence rounds (spawn hangs, inbox cache, checkpoint synthesis, ledger isolation, etc.) - Worked-out TypeScript interface - Mapping of each existing site to runtime options (table) - 8-step migration plan in blast-radius order (~4-5 days focused work) - Open questions No source-code changes. Implementation comes later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:32:28 +02:00
Copilot	1e99bd669e	fix(auto): heartbeat before unit execution to prevent false-positive watchdog stalls The HaltWatchdog fires when the loop goes >10s without a heartbeat. Each iteration ends with a heartbeat, but unit execution itself can take 3+ minutes. Without a heartbeat at the start of the unit phase, the watchdog detects idle and emits a false-positive 'possible stuck iteration' error. Add watchdog.heartbeat() immediately before both runUnitPhaseViaContract calls (one in the custom-engine path, one in the dev path) so the watchdog timer is reset before the long-running work begins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 06:30:40 +02:00
Mikael Hugo	e7cf168824	feat(model-policy): adversary role + lineage-diverse-from-worker constraint Add `adversary` to SUPPORTED_MODEL_ROLES and a new symbolic constraint `lineage-diverse-from-worker` to SUPPORTED_MODEL_ROLE_CONSTRAINTS. Default constraints for `adversary` and `reviewer` now include `lineage-diverse-from-worker` so the reviewer/adversary CANNOT be a lineage-twin of the model that produced the artifact under review — prevents "yeah looks fine to me" rubber-stamp from same-family models. Helpers exported alongside the policy: - rootVendorFor(modelId) → "anthropic" \| "openai" \| "google" \| "moonshot" \| "mistral" \| "minimax" \| "zhipu" \| "meituan" \| "unknown" - isSameRootVendor(candidateId, workerId) → boolean (fail-open on unknown) These are the building blocks the selector needs. The actual filter wiring in auto-model-selection's selectAndApplyModel is left as a documented TODO — the function doesn't currently thread role context through, so plugging in lineage filtering needs a small refactor that is out of scope here. Tests: 24 pass (was 6 + 18 new). Coverage: role registration, constraint registration, defaults, validation, rootVendor mapping matrix, isSameRootVendor predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:30:08 +02:00
Mikael Hugo	8832be0785	chore(headless): surface v2 init failure reason in fallback warning Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The catch block was swallowing the actual error, leaving operators with "v2 init failed, falling back to v1 string-matching" and no diagnostic to act on. Found out this session that the failure was build staleness (packages/coding-agent dist was not rebuilt by copy-resources) — would have been instant to diagnose if the reason had been logged. Now: "[headless] Warning: v2 init failed (Timeout waiting for response to init...), falling back to v1 string-matching" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 06:28:41 +02:00
Mikael Hugo	996b82001f	fix(auto): keep swarm continue checkpoints actionable	2026-05-15 06:26:30 +02:00
Mikael Hugo	3464db441c	fix(auto): repair empty continue checkpoints Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details	2026-05-15 06:21:58 +02:00
Mikael Hugo	7e2f62ead3	fix(verify): ignore stale repo verification commands	2026-05-15 06:11:57 +02:00
Mikael Hugo	50383eb2bf	fix(auto): honor solver swarm tool counts	2026-05-15 05:54:02 +02:00
Mikael Hugo	dbfaca61cf	fix(swarm): surface worker tool call count to bypass parent-ledger guard Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Round 7 dogfood failed with "0 tool calls — context exhaustion" even though the swarm worker's session DID call tools. Root cause: the phases-unit.js zero-tool-call guard reads from the PARENT session's message ledger via snapshotUnitMetrics. The swarm worker runs in an ISOLATED subagent session — its tool calls never appear in the parent's messages, so the guard always sees 0 and fires a false- positive context-exhaustion retry. Fix: - runUnitViaSwarm now returns swarmToolCallCount on the UnitResult, surfacing the real worker tool call count from the onEvent stream (collectedToolCalls.length, accurate end-to-end). - phases-unit.js zero-tool-call guard checks unitResult._via === "swarm" && swarmToolCallCount > 0 and bypasses the false-positive retry, logging "zero-tool-calls-swarm-bypass". Also adds a debug stderr line in subagent-runner.ts printing the tool count after bindExtensions, confirming the worker session HAS the full tool set (checkpoint + built-ins) — Hypotheses 1 and 2 from the Round 8 brief ruled out by direct observation. Tests: 3 new (swarmToolCallCount = 0 / N / 1-on-checkpoint-only); 2518 tests pass total, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 05:46:17 +02:00
Copilot	ea8a3d9354	feat(swarm): default SF_AUTONOMOUS_VIA_SWARM on in headless mode Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The swarm dispatch path is now automatically enabled when SF_HEADLESS=1 without requiring the operator to set SF_AUTONOMOUS_VIA_SWARM=1. This makes headless mode use the swarm execution engine by default, which is the intended architecture for autonomous execution. - Explicit SF_AUTONOMOUS_VIA_SWARM=1/true still works. - Explicit SF_AUTONOMOUS_VIA_SWARM=0/false disables it even in headless. - When unset + SF_HEADLESS=1, swarm is used. - When unset + SF_HEADLESS!=1, legacy path is used (unchanged). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 05:34:01 +02:00
Mikael Hugo	46d9d45279	fix(bash): block wrong project python runtime	2026-05-15 05:33:28 +02:00
Copilot	6652462a9d	fix(self-feedback): isolate headless triage spawn from auto.lock contention Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Self-feedback inline fix spawns 'sf headless triage --apply' as a detached child when SF_HEADLESS=1. The child previously grabbed the same auto.lock as the parent, causing lock contention that blocked the parent's unit execution. - Pass SF_SELF_FEEDBACK_WORKER=1 to the child environment. - session-lock: effectiveLockFile() returns auto-self-feedback.lock when the env var is set. - session-lock: effectiveLockTarget() returns .sf/parallel/self-feedback/ so the OS-level lock directory is also isolated. This mirrors the existing SF_PARALLEL_WORKER / SF_MILESTONE_LOCK mechanism used for parallel milestone workers (#2184). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 05:28:23 +02:00
Mikael Hugo	ef2b3af7dd	feat(swarm): teach worker the checkpoint contract + executor tool suite Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The swarm worker now receives the autonomous executor's compact role prompt (buildSwarmWorkerSystemPrompt in auto/run-unit.js) which teaches it the checkpoint tool contract and PDD field requirements. This closes the last gap before SF_AUTONOMOUS_VIA_SWARM=1 can become default: without the contract the worker never emitted checkpoint tool calls, so workerSignaledOutcome stayed null and the loop terminated after one unit. With the contract, the worker calls checkpoint(outcome=...) and the orchestrator gets accurate completion signals. Envelope carries two new optional fields propagated through every layer: - executorSystemPrompt: overrides the swarm worker's default prompt - executorTools: optional tool name filter Flow: runUnitViaSwarm builds them → swarmDispatchAndWait reads them from envelope → forwards to runAgentTurn → runHeadlessPrompt passes them as systemPromptOverride / toolsOverride → runSubagent. No changes needed to runSubagent: createAgentSession + bindExtensions + _refreshToolRegistry already picks up extension-registered tools like `checkpoint` automatically. Tests: 61 passing across the two affected files (22+9 baseline + 30 new); 234 test files passing overall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 05:12:55 +02:00
Mikael Hugo	54ac56d9bd	feat(swarm): honor worker checkpoint outcomes	2026-05-15 04:59:15 +02:00
Mikael Hugo	1115437cec	feat(swarm): event streaming + outcome derivation for runUnitViaSwarm - Forward onEvent through swarm-dispatch → agent-runner → runSubagent - Collect toolcall_end events in runUnitViaSwarm to build real tool-use blocks - Detect checkpoint tool outcome for accurate unit completion signal - Add headless.ts graceful shutdown (async signal handler, 2.5s timeout) - RPC client stop() now awaits flush and propagates stop to child sessions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 04:54:58 +02:00
Mikael Hugo	ffcd3d1157	chore(doc-checker): allowlist intentionally-short scaffold files The doc-checker startup hook prints a "9 files need content" advisory on every autonomous bootstrap. The flagged files are intentionally terse: - AGENTS.md indices under docs/ and .sf/harness/* point at sibling directories where the real content lives - .sf/PRINCIPLES.md / STYLE.md / NON-GOALS.md are terse-by-design bullet lists; the # heading line is stripped by countContentLines so a 9-bullet file falls one short of the 10-line threshold despite being substantive Adding them to STUB_ALLOWED_PATHS so the advisory only flags genuinely unfilled scaffolds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:43:18 +02:00
Mikael Hugo	3faa599f9d	fix(swarm): close multi-dispatch + checkpoint parity gaps Two real bugs surfaced by SF_AUTONOMOUS_VIA_SWARM=1 dogfood (Round 4): 1. Second dispatch to the same swarm agent returned reply=null because each MessageBus instance held a 30s-stale inbox cache. runAgentTurn now accepts opts.onlyMessageId; when set it forces agent._inbox.refresh() from SQLite, processes only that message, and leaves stale messages untouched for later turns. dispatchAndWait passes the just-dispatched messageId so each call is surgical. 2. runUnitViaSwarm now writes an appendAutonomousSolverCheckpoint and synthesizes a swarm_unit_complete tool_use block alongside the text reply, so phases-unit.js stops firing claimed-checkpoint-without-tool repair loops. Outcome is conservatively "continue" — a real "complete" requires the swarm agent to emit an actual checkpoint tool call (future round wires runSubagent.onEvent through dispatchAndWait). Tests: 51 passing for the two affected files (11 swarm-dispatch + 40 run-unit-via-swarm). Full suite: 1760/1760. Known remaining gap before flipping default: synthesized outcome is always "continue", so the loop relies on iteration caps for termination rather than agent-signaled completion. Wiring real tool calls through is the next round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:37:59 +02:00
Mikael Hugo	b428f1ab22	fix(headless): send terminal notification when loop exits without stopAuto Headless mode waits for 'Assisted/Autonomous mode stopped' to detect completion. When the loop exits via natural break (e.g. step-wizard in /next), stopAuto() is never called, so headless hangs forever. - Add s.stopAutoCalled flag to AutoSession - Set flag in stopAuto(), clear in cleanupAfterLoopExit() - Send terminal notification from cleanupAfterLoopExit() only when stopAuto() was bypassed - Fixes sf headless next hanging after unit completes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 04:32:05 +02:00
Mikael Hugo	78d52d7967	feat(autonomous): SF_AUTONOMOUS_VIA_SWARM=1 routes unit dispatch through swarm Add runUnitViaSwarm as an opt-in path in auto/run-unit.js. When SF_AUTONOMOUS_VIA_SWARM=1 (or =true), each unit dispatch builds a DispatchEnvelope (unitType -> workMode via deriveWorkMode), calls swarmDispatchAndWait, and returns the agent reply as a synthetic {status: "completed", event.messages: [{role: "assistant", content: reply}]} matching the shape phases-unit.js / classifyExecutorRefusal already expect. Default (flag unset) is byte-identical to today — no regression in the default path, 1751/1751 tests pass. Known gap (acceptable for an experimental opt-in, must be closed before swarm becomes default): - Tool-call events from the swarm worker do NOT surface to the orchestrator UI (runAgentTurn handles them internally). - The worker emits a plain text reply, not a structured checkpoint, so phases-unit.js' checkpoint-missing repair path will not trigger and classifyExecutorRefusal will not detect refusals. This is the first concrete step toward routing autonomous unit work through swarm: role-based agent selection, memory inheritance via the envelope, and a durable bus audit trail of every unit dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:27:00 +02:00
Mikael Hugo	bbade22388	feat(swarm): dispatchAndWait — synchronous request/response for swarm agents Add SwarmDispatchLayer.dispatchAndWait(envelope, { timeoutMs, signal }) which enqueues via _busDispatch, drives the target agent's turn via runAgentTurn (in-process runSubagent), and reads back the agent's reply from the bus. Returns DispatchResult extended with reply + replyMessageId. This is the missing piece for collapsing /delegate-style subagent calls into the swarm interface: callers that need a reply (not just delivery) can now use the swarm contract instead of the subagent extension's bespoke dispatch path. Round 4 will migrate those callers. New helper MessageBus.getReplyTo(messageId, fromAgent) queries SQLite directly via json_extract for the most recent reply to a given message. Plus 8 tests covering happy path, error paths (no reply, runner throws, runner returns {error}), the swarmDispatchAndWait convenience function, and the A2A short-circuit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:15:52 +02:00
Mikael Hugo	903cdd4d9d	feat(subagent): event streaming for in-process runSubagent Add RunSubagentOptions.onEvent callback so callers (TUI live update panel for /delegate, /rubber-duck, etc.) get every session event without polling. Errors from the callback are caught so a buggy caller cannot crash the agent. Chain caller-supplied AbortSignal through a local AbortController in runSingleAgent and register it in a new liveSubagentControllers set so stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based processes (cmux split, sift codebase_search). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:04:52 +02:00
Mikael Hugo	62f886430c	fix: run subagents in process by default	2026-05-15 03:59:34 +02:00
Mikael Hugo	8b0f0bbd65	fix: harden headless dogfood self-healing	2026-05-15 03:53:15 +02:00
Mikael Hugo	3ac5aede1e	fix: repair headless runtime self-healing	2026-05-15 03:33:29 +02:00
Mikael Hugo	72c3811a7b	feat(auto): auto-triage TODO.md on each autonomous cycle - Add autoTriageTodo() helper that checks root TODO.md for raw dump notes beyond the empty template before each autonomous cycle - Lazy-imports buildTodoTriageLLMCall + triageTodoDump from commands-todo.js to avoid startup overhead - Triage results written to DB backlog with clear=true + backlog=true - Best-effort: never blocks autonomous loop on triage failure - Fast-path skips when TODO.md is empty template or doesn't exist Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 03:19:13 +02:00
Mikael Hugo	ca7ff554c3	feat(swarm): integrate LLM runner into AgentSwarm.run() - Make AgentSwarm.run() async with optional enableLLM flag - Wire runAgentTurn from agent-runner.js into all 4 topologies (round_robin, supervisor, dynamic, sleeptime) - Update drainSleeptimeQueue to use runAgentTurn for actual LLM execution instead of passive inbox reading - Export runAgentTurn, runAgentLoop, runSwarmTurn from uok/index.js - Update PersistentAgent JSDoc to reflect runner exists - Fix test imports after extension consolidation (ttsr, google-search) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 03:05:01 +02:00
Mikael Hugo	f6619b792c	refactor(extensions): move cmux into sf extension as internal module cmux was a standalone extension directory with no extension-manifest.json, functioning as a utility library for the sf extension. Moving it into sf/cmux/ makes the dependency explicit and removes the orphaned extension directory. Import paths updated: - commands-cmux.js, notifications.js, auto.js: ../cmux → ./cmux - bootstrap/system-context.js: ../../cmux → ../cmux - subagent/index.js: ../../cmux → ../cmux Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:34:35 +02:00
Mikael Hugo	534ed85ee1	refactor(extensions): merge google-search into search-the-web Google Search was a standalone extension providing a single tool (google_search) that used Gemini's Google Search grounding feature. It had fallback logic to search-the-web providers (Tavily, Brave) when Google OAuth was unavailable. Merging it into search-the-web consolidates all web search capabilities into one extension and eliminates the tight coupling between the two. Changes: - Copied google-search tool logic into search-the-web/tool-google-search.js - Added registerGoogleSearchTool / resetGoogleSearchCache exports - Integrated into search-the-web/index.js deferred loading - Added google_search to search-the-web extension-manifest.json tools - Deleted google-search/ extension directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:33:05 +02:00
Mikael Hugo	f0c3eaf999	refactor(extensions): merge ttsr into guardrails TTSR (Time Traveling Stream Rules) monitored streaming output against regex patterns. Guardrails blocked dangerous actions and redacted secrets. Both are safety/guardrail concerns — merging them into one extension reduces surface area and simplifies the safety model. Changes: - Copied ttsr-rule-loader.js, ttsr-manager.js, ttsr-interrupt.md into guardrails/ - Updated guardrails extension-manifest.json with ttsr hooks (turn_start, message_update, turn_end, agent_end) - Integrated TTSR session_start/turn_start/message_update/turn_end/agent_end handlers into guardrails/index.js - Deleted ttsr/ extension directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:28:40 +02:00

1 2 3 4 5 ...

4504 commits