Commit graph

28 commits

Author SHA1 Message Date
Copilot
cf9203aee0 feat(swarm): forward parent permission profile to in-process worker sessions
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
In-process swarm workers get a fresh headless AgentSession whose permission
extension defaults to read-only minimal. This blocks normal autonomous edits
(e.g., write_file, edit) even when the parent session runs at normal or
trusted level.

- run-unit.js: add legacyPermissionLevelForProfile mapping and include
  executorPermissionLevel in the dispatch envelope.
- swarm-dispatch.js: forward executorPermissionLevel from envelope to
  runAgentTurn as permissionLevel.
- agent-runner.js: accept permissionLevel option and pass it to
  runSubagent config.
- subagent-runner.ts: add permissionLevel to SubagentConfig; when set,
  temporarily set SF_PERMISSION_LEVEL env and run extension lifecycle so
  the permission extension reads the level before tool hooks execute.
- Tests for envelope field, dispatch forwarding, and run-unit integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 06:38:42 +02:00
Mikael Hugo
dbfaca61cf fix(swarm): surface worker tool call count to bypass parent-ledger guard
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Round 7 dogfood failed with "0 tool calls — context exhaustion" even
though the swarm worker's session DID call tools. Root cause: the
phases-unit.js zero-tool-call guard reads from the PARENT session's
message ledger via snapshotUnitMetrics. The swarm worker runs in an
ISOLATED subagent session — its tool calls never appear in the
parent's messages, so the guard always sees 0 and fires a false-
positive context-exhaustion retry.

Fix:
- runUnitViaSwarm now returns swarmToolCallCount on the UnitResult,
  surfacing the real worker tool call count from the onEvent stream
  (collectedToolCalls.length, accurate end-to-end).
- phases-unit.js zero-tool-call guard checks
  unitResult._via === "swarm" && swarmToolCallCount > 0 and bypasses
  the false-positive retry, logging "zero-tool-calls-swarm-bypass".

Also adds a debug stderr line in subagent-runner.ts printing the tool
count after bindExtensions, confirming the worker session HAS the
full tool set (checkpoint + built-ins) — Hypotheses 1 and 2 from the
Round 8 brief ruled out by direct observation.

Tests: 3 new (swarmToolCallCount = 0 / N / 1-on-checkpoint-only);
2518 tests pass total, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 05:46:17 +02:00
Mikael Hugo
46d9d45279 fix(bash): block wrong project python runtime 2026-05-15 05:33:28 +02:00
Mikael Hugo
1115437cec feat(swarm): event streaming + outcome derivation for runUnitViaSwarm
- Forward onEvent through swarm-dispatch → agent-runner → runSubagent
- Collect toolcall_end events in runUnitViaSwarm to build real tool-use blocks
- Detect checkpoint tool outcome for accurate unit completion signal
- Add headless.ts graceful shutdown (async signal handler, 2.5s timeout)
- RPC client stop() now awaits flush and propagates stop to child sessions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 04:54:58 +02:00
Mikael Hugo
903cdd4d9d feat(subagent): event streaming for in-process runSubagent
Add RunSubagentOptions.onEvent callback so callers (TUI live update panel
for /delegate, /rubber-duck, etc.) get every session event without polling.
Errors from the callback are caught so a buggy caller cannot crash the agent.

Chain caller-supplied AbortSignal through a local AbortController in
runSingleAgent and register it in a new liveSubagentControllers set so
stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based
processes (cmux split, sift codebase_search).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 04:04:52 +02:00
Mikael Hugo
62f886430c fix: run subagents in process by default 2026-05-15 03:59:34 +02:00
Mikael Hugo
8b0f0bbd65 fix: harden headless dogfood self-healing 2026-05-15 03:53:15 +02:00
Mikael Hugo
2d5a05a48b fix(security): resolve 7 findings from full-repo code review
- Create web/middleware.ts to authenticate all API routes via bearer token
  and origin checks (previously unauthenticated due to missing middleware file)

- Fix path traversal in browse-directories: replace startsWith with
  realpathSync + relative + isAbsolute containment checks

- Fix XSS in session HTML export: escape raw HTML blocks via marked renderer

- Fix PTY process leak: destroy session on SSE stream cancellation

- Fix unhandled exception in terminal sessions POST: wrap getOrCreateSession
  in try/catch with structured JSON error response

- Fix silent child-process failure in headless dispatch: add exit handler
  to write failed claim when sf headless triage exits non-zero

- Fix TypeError on malformed claim JSON: add Array.isArray guard before
  accessing claim.ids.length

All changes type-check cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:18:43 +02:00
Mikael Hugo
2e4bdd292c fix: keep hidden sf commands callable in print mode 2026-05-14 21:25:18 +02:00
Mikael Hugo
f88b48b0aa fix: show print mode liveness 2026-05-14 20:59:19 +02:00
Mikael Hugo
487237a32c fix: bound sf print mode and chat routing 2026-05-14 20:55:00 +02:00
Mikael Hugo
47867c1236 feat: route clear sf chat commands 2026-05-14 20:21:37 +02:00
Mikael Hugo
ab1a1edcf9 refactor: tier sf slash commands 2026-05-14 20:14:09 +02:00
Mikael Hugo
7ea41b89ae feat(ai,coding-agent): wireModelId — provider deployment alias
Adds an optional wireModelId field to the Model interface and a
resolveWireModelId helper. Forge's canonical model.id stays stable for
selection, capability scoring, policy, and history; providers now send
model.wireModelId on the wire when set, model.id otherwise.

Use cases: Azure deployment names, vendor model slugs that differ
from Forge's canonical identity, A/B routing where the operator wants
canonical history but a specific deployment.

Wired through every provider in @singularity-forge/ai (anthropic,
amazon-bedrock, azure-openai-responses, google, google-vertex,
google-gemini-cli, mistral, openai-codex-responses, openai-completions,
openai-responses) plus @singularity-forge/coding-agent's
ModelRegistry (model definitions + per-model overrides).

Tests: openai-completions wireModelId payload coverage +
model-registry-auth-mode coverage for the override + definition fields.
Full pi-ai + coding-agent suite: 956/956 ✓ (7 unrelated skipped).

This realizes the model-registry contract drafted in 1d753af6b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:25:21 +02:00
Mikael Hugo
ca7368e5f1 fix(bash): add 120s default timeout to prevent autonomous mode hangs
- Add BUILT_IN_DEFAULT_TIMEOUT_SECS = 120 constant to bash tool
- Compute effectiveTimeout = timeout ?? resolvedDefaultTimeout so LLM
  calls without a timeout get the 120s guard automatically
- Add defaultTimeoutSeconds? to BashToolOptions for override at creation
- Dynamic bashSchemaWithDefault describes the actual default in the LLM
  tool description, improving model awareness
- Add BashSettings interface + getBashDefaultTimeoutSeconds() to
  SettingsManager so users can override or disable via settings.json
- Wire defaultTimeoutSeconds into agent-session.ts _buildRuntime()

Root cause: npx sf --help triggered npm package download, hanging for
4+ minutes without timeout, consuming entire autonomous run budget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:12:33 +02:00
Mikael Hugo
338c75fc6f refactor: complete rf-01/rf-02/rf-11 blocked todos
rf-01: add ECONNREFUSED to isTransientNetworkError in anthropic-shared.ts,
  aligning with the NETWORK_RE pattern in error-classifier.js

rf-02: add scripts/validate-model-cost-table.mjs to report coverage gaps
  and price divergence between model-cost-table.js and models.generated.ts;
  add 'validate-cost-table' script to package.json

rf-11: extract 10 pure resource-display utility functions from
  interactive-mode.ts into packages/coding-agent/src/modes/interactive/
  resource-display.ts, reducing interactive-mode.ts by ~282 lines

All 4375 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 16:45:39 +02:00
Mikael Hugo
1adc7f119c refactor(rf-06): split auto/phases.js into per-phase modules
3538-line monolith → 6 focused modules + thin barrel:
- phases-helpers.js (223 lines): shared helpers (generateMilestoneReport,
  closeoutAndStop, emitCancelledUnitEnd, maybeFireProductAudit,
  _resolveReportBasePath, recordLearningOutcomeForUnit)
- phases-dispatch.js (486 lines): runDispatch + assessUokDiagnosticsDispatchGate
- phases-guards.js (497 lines): runGuards + guard helpers
- phases-pre-dispatch.js (760 lines): runPreDispatch
- phases-unit.js (1477 lines): runUnitPhase + session timeout state
- phases-finalize.js (542 lines): runFinalize
- phases.js (13 lines): barrel re-export preserving original import surface

Removed dead runPhaseReview export (zero callers confirmed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 15:14:49 +02:00
Mikael Hugo
0b5fa75c0d fix(lint): fix all pre-existing lint failures
- check-sf-extension-inventory.mjs: expand parseDirectRegisteredCommands()
  scan to include 7 more files (guards/inturn.js, notifications/notify.js,
  permissions/index.js, ui/usage-bar.js, commands/legacy/audit.js,
  commands/legacy/create-extension.js, commands/legacy/create-slash-command.js)
  and filter results by BASE_RUNTIME_COMMAND_NAMES to exclude doc-string false
  positives ("name" in create-slash-command.js template text)

- extension-manifest.json: remove 'clear' (subcommand of logs/notifications,
  never a top-level pi.registerCommand)

- packages/pi-agent-core/src/db/sf-db.ts: fix 23 noVoidTypeReturn errors
  - openDatabase: void → boolean (caller uses return value at line 5625)
  - claimEscalationOverride: void → boolean (caller checks at escalation.js:243)
  - resolveSelfFeedbackEntry: void → boolean (caller checks at self-feedback.js:387)
  - copyWorktreeDb: void → boolean (caller checks at reconcileWorktreeDb)
  - compactUokMessages: void → {before,after} (caller returns value at message-bus.js:238)
  - insertSessionTurn: void → bigint|null (caller uses id at session-recorder.js:104)
  - expireStaleMemories: void → number (caller uses count at auto-start.js:1047)
  - deleteMemorySourceRow: void → boolean (caller returns value at memory-source-store.js:107)
  - deleteMemoryEmbedding: void → boolean (caller returns value at memory-embeddings.js:328)
  - updateBacklogItemStatus: remove dead return expression (callers discard value)
  - removeBacklogItem: remove dead return expression (callers discard value)
  - updateGateCircuitBreaker: remove dead return {total,avgMs,...} (wrong-type
    code accidentally merged from getGateLatencyStats, never reachable)
  - markUokMessageRead: remove dead return true/false (callers discard value)

- Auto-fix formatting and organizeImports in ~30 source files (biome --write)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 04:02:31 +02:00
Mikael Hugo
e50321b62b feat(selection): thread unitType + failure_mode into fallback outcome records
- FallbackResolver.setUnitContext() stores {unitType,unitId} from autonomous dispatch
- run-unit.js calls pi.setFallbackUnitContext() before/after each unit
- _findAnyAvailableFallback uses real unitType/unitId from context, not sentinel
- Schema v59: failure_mode column in llm_task_outcomes
- insertLlmTaskOutcome accepts failure_mode (rate_limit, quota_exhausted, auth_error)
- register-hooks.js passes event.classification.reason as failure_mode
- register-hooks.js uses real event.unitId when available
- ExtensionRuntimeActions.setFallbackUnitContext added to pi API surface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 23:14:22 +02:00
Mikael Hugo
009651e86f feat(selection): wire before_model_select into FallbackResolver for outcome-aware fallback
When a model fails and FallbackResolver picks a replacement, it now:
1. Fires the before_model_select hook with reason='fallback' and the
   failing model's ID — the learning system records the failure outcome
   and returns the best Bayesian-blended replacement from llm_task_outcomes
2. Falls back to the existing heuristic sort (reasoning + context window)
   if the hook is unavailable or returns no override

Changes:
- BeforeModelSelectEvent: add optional currentModelId and reason fields
- FallbackResolver: accept emitBeforeModelSelect in constructor; make
  _findAnyAvailableFallback async; fire hook before heuristic fallback
- agent-session.ts: inject lazy emitBeforeModelSelect closure into resolver
- register-hooks.js: record failure outcome when reason='fallback' before
  returning selectLearnedModel result

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 23:05:33 +02:00
Mikael Hugo
fb1bd3e5fa refactor(shared): deduplicate shared/ utilities against coding-agent package exports
- Add packages/coding-agent/src/utils/format.ts as the canonical source
  for formatDuration, formatTokenCount, truncateWithEllipsis, sparkline,
  formatDateShort, fileLink, stripAnsi, normalizeStringArray — all already
  exported from @singularity-forge/coding-agent via index.ts.

- Convert shared/format-utils.js to a compatibility shim that re-exports
  the 8 functions from @singularity-forge/coding-agent. All 13 importers
  continue to work with no import changes required.

- Convert shared/path-display.js to a compatibility shim that re-exports
  toPosixPath from @singularity-forge/coding-agent. Implementation in
  packages/coding-agent/src/utils/path-display.ts was already canonical.

- shared/frontmatter.js is intentionally NOT shimmed: splitFrontmatter/
  parseFrontmatterMap have a different API from the package's parseFrontmatter/
  stripFrontmatter (flat-map vs {frontmatter, body} object).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 22:41:03 +02:00
Mikael Hugo
7227912a29 perf(search): move web-search provider injection from extension hook to native middleware
- Create packages/coding-agent/src/core/providers/web-search-middleware.ts with
  WebSearchMiddleware class: injects web_search tool, enforces session budget (#1309),
  strips thinking blocks from history, and respects PREFERENCES.md search_provider.

- Wire webSearchMiddleware.applyToPayload into sdk.ts onPayload callback (before
  extension hook dispatch) so injection runs as compiled TypeScript with zero
  jiti-dispatch overhead.

- Export WebSearchMiddleware, webSearchMiddleware singleton, setPreferBraveResolver,
  CUSTOM_SEARCH_TOOL_NAMES, MAX_NATIVE_SEARCHES_PER_SESSION, and stripThinkingFromHistory
  from @singularity-forge/coding-agent so the extension can delegate to the same instance.

- Refactor search-the-web/native-search.js: remove self-contained injection logic;
  import and delegate before_provider_request to webSearchMiddleware singleton.
  Use tri-state isAnthropicProvider (null/false/true) to synthesize a provider hint
  when event.model is absent but model_select has already fired — prevents the
  model-name heuristic from wrongly injecting into Copilot claude-* requests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 22:37:42 +02:00
Mikael Hugo
3fba4bcb03 refactor(mcp): move MCP connection manager to packages/coding-agent/src/core/mcp/
- Create config.ts with McpServerConfig types and readMcpConfigs/getServerConfig
- Create auth.ts with buildHttpTransportOpts and createCliOAuthProvider
- Create connection-manager.ts with McpConnectionManager class
- Create index.ts re-exporting the public API
- Export McpConnectionManager and helpers from @singularity-forge/coding-agent
- Rewrite mcp-client extension as thin wrapper using McpConnectionManager
- Rewrite auth.js as re-export shim from @singularity-forge/coding-agent
- Update test to import buildHttpTransportOpts from @singularity-forge/coding-agent

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 22:19:46 +02:00
Mikael Hugo
a77e1551d2 refactor(memory): consolidate memory system, remove dead code
- Delete memory-backfill.js — not imported anywhere, dead code
- Rename memory-sleeper.js → tool-watchdog.js — misnamed; it is a
  tool-output watchdog with no relation to the memory store
- Collapse memory-embeddings-llm-gateway.js into memory-embeddings.js —
  removes the lazy-import split; loadGatewayConfigFromEnv,
  createGatewayEmbedFn, and rerankCandidates are now direct exports
- Remove buildEmbeddingFn() dead stub (always returned null)
- Enable packages/coding-agent memory extraction extension by default
  (memory.enabled ?? true) so session-level extraction is active
- Update all import sites and tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 18:17:49 +02:00
Mikael Hugo
3ffd882c8c sf snapshot: uncommitted changes after 56m inactivity 2026-05-10 17:16:30 +02:00
Mikael Hugo
924383b6f7 sf snapshot: uncommitted changes after 197m inactivity 2026-05-10 15:59:33 +02:00
Mikael Hugo
cab8b5decc refactor: strip internal pi branding (Phase 2A)
- CURSOR_MARKER: \x1b_pi:c\x07 → \x1b_sf:c\x07
- process.title: "pi" → "sf"
- PiManifest → SFManifest (with pi field backwards compat)
- readPiManifest → readSFManifest (loader.ts and package-manager.ts)
- readPiManifestFile → readSFManifestFile (package-manager.ts)
- .pi/skills → .sf/skills (keeps .pi/skills for backwards compat)
- User-facing path strings updated to .sf/ where appropriate
- ARCHITECTURE.md: "Pi coding-agent extension" → "coding-agent extension"
- Temp editor file: pi-editor-*.pi.md → sf-editor-*.sf.md
- Test fixtures: appName "pi" → "sf", pi manifest field → sf

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 11:50:55 +02:00
Mikael Hugo
6725a55591 feat(web): add error boundaries, expand test coverage, add README
- Add class-based ErrorBoundary component wrapping all 7 main views
  inside WorkspaceChrome; fallback shows view name, error, reload button
- Add 30 new unit tests (boot null-project path × 9, onboarding
  pure-function logic × 21); all 43 web/lib tests pass
- Add web/README.md: architecture, auth flow, 7 views, dev setup,
  API route pattern, test instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 11:24:40 +02:00