Commit graph

829 commits

Author SHA1 Message Date
Mikael Hugo
ffdec0feee fold: hashline_edit + hashline_read → Edit({match}) + Read({format}) modes
Per operator R-entry sf-mp9wo7e3-sdxqss + no-compat directive.

- Edit gains `match: "substring"|"anchor"` arg; anchor mode routes to the
  existing applyHashlineEdits logic. Substring stays default.
- Read gains `format: "plain"|"tagged"` arg; tagged mode emits LINE#HASH
  prefixes via formatHashLines.
- Delete hashline-edit.ts, hashline-read.ts. KEEP hashline.ts (helpers
  are now Edit/Read internals).
- tools/index.ts: drop the two tools + the createHashlineCodingTools
  preset.
- agent-session.ts: setEditMode no longer swaps tool instances (single
  tool surface; mode preserved for system-prompt context only).
- sdk.ts + index.ts: remove hashline tool re-exports.
- headless-ui.ts + test: remove hashline_edit case.

Net agent-visible tool surface: -2 tools. Capability preserved as modes.
No backward-compat alias for the removed tool names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:39:59 +02:00
Mikael Hugo
d03758d803 feat: replace launchd with systemd user-unit install path
Operator-direction 2026-05-17 "we will never use mac" — no compat
preservation. Single-cutover replacement.

- new packages/daemon/src/systemd.ts: install/uninstall/status using
  systemctl --user + ~/.config/systemd/user/sf-server.service
- new packages/daemon/src/systemd.test.ts: ports launchd tests, same
  shape, mocked systemctl via RunCommandFn injection + SF_SYSTEMD_USER_DIR
  env override for real filesystem tests
- cli-main.ts: switch import + update help text + status messages
- index.ts: re-export systemd module (installSystemdUnit, uninstallSystemdUnit,
  systemdUnitStatus, generateUnit, getServicePath, SystemdStatus, SystemdUnitOptions)
- DELETED: launchd.ts (253 LOC), launchd.test.ts (379 LOC)
- docs/dev/drafts/M053-per-repo-supervisor.md: remove "launchd" mention
- CHANGELOG.md: document systemd-only install path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:34 +02:00
Mikael Hugo
57fef5979d feat: make sf server the operator entrypoint 2026-05-17 17:23:46 +02:00
Mikael Hugo
6e3b3d3c54 feat: add Serena-style AST tools (ReplaceSymbol, InsertAroundSymbol, AstGrep)
Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as
LLM tools so agents can do tree-sitter-anchored code edits instead of
substring-based Edit or line-anchor hashline.

- replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody);
  matches function/class/method declarations via tree-sitter, returns
  matched=false sentinel when the symbol isn't located.
- insert-around-symbol.ts (+122): wraps insertAroundSymbol with position
  enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd.
- ast-grep.ts (+152): wraps astGrep for pattern matching across files with
  $VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column
  + captured meta-variable bindings.

Each tool:
  - typebox schema matching the existing AgentTool pattern (edit.ts)
  - notifyFileChanged() into the LSP layer on write ops
  - resolveToCwd() for path normalization
  - catches native errors + returns isError result with the
    NativeUnavailableError message pointing operators to
    `nix develop` + `node rust-engine/scripts/build.js --dev`

Wire-in:
- tools/index.ts: re-exports + imports + entries in `allTools` map and
  createAllTools() factory.
- extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep
  appended to provides.tools so SF extension agents see them.

Higher value than substring/line-anchor for code in tree-sitter-supported
languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code
files. PascalCase names per the Claude-Code-aligned convention from
sf-mp9w20y1-nld9hc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:14:12 +02:00
Mikael Hugo
19b10eb67c feat: make sf-server own swarm registry sync 2026-05-17 17:05:16 +02:00
Mikael Hugo
0f5a606923 fix: native loader — loud banner on fallback + structured load log + helpers
- Stderr banner on fallback now multi-line with concrete fix steps
  (nix develop → node rust-engine/scripts/build.js --dev) so an operator
  scanning a 280MB cycle log can't miss it. The old single-line warning
  was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check).
- Structured load record per process at .sf/runtime/native-engine-load.jsonl:
  {ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}.
  Lets operators audit which binary each SF process loaded — and detect
  ABI mismatches across daemon↔worker boundaries when different sha256
  values appear for the same platformTag (the "rare but real" concern
  flagged earlier today).
- Proxy error message now points to the build/install commands instead
  of just saying "not available". NativeUnavailableError is named for
  consumer try/catch chains.
- Fixed _loadedSuccessfully ordering — was set true BEFORE the require,
  leaving stale-true after a failed first attempt.
- New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256()
  for diagnostic surfaces (sf headless query, doctor checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:34:02 +02:00
Mikael Hugo
9a84d82cdb chore(release): 2.75.3 → 2.75.4 + workspace dependency refresh
Bumps version across the workspace (root + 10 @singularity-forge/*
packages) and lands the pending dependency refresh that had been
sitting uncommitted:

  @anthropic-ai/sdk         0.95.1 → 0.96.0
  @anthropic-ai/vertex-sdk  0.14.4 → 0.16.0
  @google/genai             2.0    → 2.3
  @logtape/{file,logtape,pretty,redaction}  2.0.7 → 2.0.9
  @smithy/node-http-handler 4.7.0  → 4.7.3
  @clack/prompts            1.3    → 1.4
  @types/mime-types         2.1    → 3.0

Inter-package refs in packages/{daemon,ai}/package.json bumped to
^2.75.4 so the workspace stays self-consistent. package-lock.json
regenerated via `npm install --package-lock-only --legacy-peer-deps`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:59:14 +02:00
Mikael Hugo
f55d490e1d fix(subagent-runner): drop spurious 10s STUCK warning on session.prompt
The phaseWatchdog at 10s fired "STUCK phase=session.prompt" on every
healthy LLM call longer than 10 seconds. Verified via strace on the
running dogfood sf: bytes were actively flowing on the TLS socket
(fd 29) to the LLM provider while STUCK was being logged — the
session.prompt was never actually stuck, the watchdog was just
diagnostic-only and oblivious to stream activity.

The noOutputTimeoutMs watchdog (set to 60s for triage in commit
d80060fec) is the actual kill mechanism. It is already event-aware:
every meaningful subagent event resets the timer via armNoOutputTimer
+ isMeaningfulSubagentOutputEvent. The 10s STUCK warning was added
in commit 67e5ac9db as investigation infrastructure for the
sf-mp8e02m1-zpk903 family of bugs, but now it is just noise that
makes legitimate 30-200s LLM responses look broken.

Keeps the 10s STUCK watchdog for the three setup phases
(resourceLoader.reload, createAgentSession, bindExtensions) where
10s of silence is a real hang signal — those phases normally run in
sub-second.

Also includes:
- biome.json: bump $schema URL from 2.4.14 to 2.4.15 to match the
  current biome CLI (clears the deserialize warning)
- scripts/check-test-imports.{,test.}mjs: format + drop a useless
  regex escape that biome flagged in landed code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:49:43 +02:00
Mikael Hugo
365c6bbc3b chore: formatter / linter touch-up (230 files)
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Pure formatting / lint-fix pass that ran during `npm run build:core`
in the session that landed the agent-runner / quota / coverage /
phase-2 routing work. No logic changes — indentation, trailing
commas, import sort, etc. Captured separately so the actual feature
commits stay scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:19:53 +02:00
Mikael Hugo
67e5ac9db1 diag(subagent-runner): per-phase timing + stuck-watchdog for sf-mp8e02m1-zpk903
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds visible diagnostics to runSubagent so the next time the
"session initialized but no LLM call" bug fires, the log identifies
which setup phase hangs.

Phases instrumented:
  - resourceLoader.reload()
  - createAgentSession()
  - bindExtensions(runLifecycle=...)
  - session.prompt() entry → return

Output format (stderr, prefixed with [subagent:<name>]):
  phase=resourceLoader.reload 23ms
  phase=createAgentSession 142ms
  phase=bindExtensions 89ms runLifecycle=true
  phase=session.prompt-entered taskLen=8421 timeoutMs=480000 noOutputMs=180000
  phase=session.prompt-returned 16234ms          ← normal completion
  STUCK phase=<X> 10000ms (no completion signal ...)   ← when watchdog fires

Each phase has a soft 10s watchdog that emits a STUCK line if the
await doesn't complete in time. The watchdog never aborts — just
surfaces visibility. Existing timeoutMs / noOutputTimeoutMs handle
actual termination.

This is investigation infrastructure for the third prompt-never-sent
seam (coding-agent/subagent-runner). The agent-runner.js seam
(sf-mp8g4rcd-w01tkh) was fixed in commit 8ee4d8358 with bounded
retries. This commit doesn't fix the underlying bug — it makes the
bug self-reporting next time it fires so operator and autonomous
loop both get actionable signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:40:17 +02:00
Mikael Hugo
b5764af27b sf snapshot: uncommitted changes after 33m inactivity 2026-05-16 17:00:13 +02:00
Mikael Hugo
da0c41d375 sf snapshot: uncommitted changes after 56m inactivity 2026-05-16 14:59:40 +02:00
Mikael Hugo
e2e096c5c7 feat(rpc): configurable RPC init timeout via SF_RPC_INIT_TIMEOUT_MS
Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init().
Default init timeout increased from 30s to 120s. Override via env var.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 20:00:26 +02:00
Mikael Hugo
3a14fe86a7 test(list-models): isolate from developer's discovery-cache
Tests were picking up the developer's real
~/.sf/agent/discovery-cache.json and seeing unexpected models in
output. Pin tests to a guaranteed-missing path via the new
_discoveryCacheFilePath option so the env they observe is solely
what the test constructs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:37:11 +02:00
Mikael Hugo
7ba469cff1 feat(memory): add debug logging to memory extraction pipeline
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The memory extraction system has infrastructure (DB tables, LLM prompts,
unit closeout wiring, embedding backfill) but zero processed units and
only self-feedback-resolution memories. This suggests extraction is
failing silently.

Add debugLog() calls throughout extractMemoriesFromUnit() so we can
observe:
- Skip reasons (mutex busy, rate limited, already processed, file too small)
- Start/done lifecycle per unit
- LLM call and parse outcomes
- Error messages on failure and retry

This makes the extraction pipeline observable via --debug or the
journal/debug log without changing behavior.

Tests: 185 files / 1993 tests pass.
Type check: clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 16:09:36 +02:00
Mikael Hugo
d57cd84d9a fix(auto): make halt watchdog observable 2026-05-15 08:09:02 +02:00
Mikael Hugo
f9c147a08b fix(swarm): ignore heartbeats for silent worker timeout 2026-05-15 08:00:35 +02:00
Mikael Hugo
e464a1bd6e fix(swarm): bound silent worker responses 2026-05-15 07:35:31 +02:00
Copilot
cf9203aee0 feat(swarm): forward parent permission profile to in-process worker sessions
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
In-process swarm workers get a fresh headless AgentSession whose permission
extension defaults to read-only minimal. This blocks normal autonomous edits
(e.g., write_file, edit) even when the parent session runs at normal or
trusted level.

- run-unit.js: add legacyPermissionLevelForProfile mapping and include
  executorPermissionLevel in the dispatch envelope.
- swarm-dispatch.js: forward executorPermissionLevel from envelope to
  runAgentTurn as permissionLevel.
- agent-runner.js: accept permissionLevel option and pass it to
  runSubagent config.
- subagent-runner.ts: add permissionLevel to SubagentConfig; when set,
  temporarily set SF_PERMISSION_LEVEL env and run extension lifecycle so
  the permission extension reads the level before tool hooks execute.
- Tests for envelope field, dispatch forwarding, and run-unit integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 06:38:42 +02:00
Mikael Hugo
dbfaca61cf fix(swarm): surface worker tool call count to bypass parent-ledger guard
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Round 7 dogfood failed with "0 tool calls — context exhaustion" even
though the swarm worker's session DID call tools. Root cause: the
phases-unit.js zero-tool-call guard reads from the PARENT session's
message ledger via snapshotUnitMetrics. The swarm worker runs in an
ISOLATED subagent session — its tool calls never appear in the
parent's messages, so the guard always sees 0 and fires a false-
positive context-exhaustion retry.

Fix:
- runUnitViaSwarm now returns swarmToolCallCount on the UnitResult,
  surfacing the real worker tool call count from the onEvent stream
  (collectedToolCalls.length, accurate end-to-end).
- phases-unit.js zero-tool-call guard checks
  unitResult._via === "swarm" && swarmToolCallCount > 0 and bypasses
  the false-positive retry, logging "zero-tool-calls-swarm-bypass".

Also adds a debug stderr line in subagent-runner.ts printing the tool
count after bindExtensions, confirming the worker session HAS the
full tool set (checkpoint + built-ins) — Hypotheses 1 and 2 from the
Round 8 brief ruled out by direct observation.

Tests: 3 new (swarmToolCallCount = 0 / N / 1-on-checkpoint-only);
2518 tests pass total, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 05:46:17 +02:00
Mikael Hugo
46d9d45279 fix(bash): block wrong project python runtime 2026-05-15 05:33:28 +02:00
Mikael Hugo
54ac56d9bd feat(swarm): honor worker checkpoint outcomes 2026-05-15 04:59:15 +02:00
Mikael Hugo
1115437cec feat(swarm): event streaming + outcome derivation for runUnitViaSwarm
- Forward onEvent through swarm-dispatch → agent-runner → runSubagent
- Collect toolcall_end events in runUnitViaSwarm to build real tool-use blocks
- Detect checkpoint tool outcome for accurate unit completion signal
- Add headless.ts graceful shutdown (async signal handler, 2.5s timeout)
- RPC client stop() now awaits flush and propagates stop to child sessions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 04:54:58 +02:00
Mikael Hugo
903cdd4d9d feat(subagent): event streaming for in-process runSubagent
Add RunSubagentOptions.onEvent callback so callers (TUI live update panel
for /delegate, /rubber-duck, etc.) get every session event without polling.
Errors from the callback are caught so a buggy caller cannot crash the agent.

Chain caller-supplied AbortSignal through a local AbortController in
runSingleAgent and register it in a new liveSubagentControllers set so
stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based
processes (cmux split, sift codebase_search).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 04:04:52 +02:00
Mikael Hugo
62f886430c fix: run subagents in process by default 2026-05-15 03:59:34 +02:00
Mikael Hugo
8b0f0bbd65 fix: harden headless dogfood self-healing 2026-05-15 03:53:15 +02:00
Mikael Hugo
f0c3eaf999 refactor(extensions): merge ttsr into guardrails
TTSR (Time Traveling Stream Rules) monitored streaming output against regex
patterns. Guardrails blocked dangerous actions and redacted secrets. Both are
safety/guardrail concerns — merging them into one extension reduces surface
area and simplifies the safety model.

Changes:
- Copied ttsr-rule-loader.js, ttsr-manager.js, ttsr-interrupt.md into guardrails/
- Updated guardrails extension-manifest.json with ttsr hooks (turn_start,
  message_update, turn_end, agent_end)
- Integrated TTSR session_start/turn_start/message_update/turn_end/agent_end
  handlers into guardrails/index.js
- Deleted ttsr/ extension directory

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:28:40 +02:00
Mikael Hugo
2d5a05a48b fix(security): resolve 7 findings from full-repo code review
- Create web/middleware.ts to authenticate all API routes via bearer token
  and origin checks (previously unauthenticated due to missing middleware file)

- Fix path traversal in browse-directories: replace startsWith with
  realpathSync + relative + isAbsolute containment checks

- Fix XSS in session HTML export: escape raw HTML blocks via marked renderer

- Fix PTY process leak: destroy session on SSE stream cancellation

- Fix unhandled exception in terminal sessions POST: wrap getOrCreateSession
  in try/catch with structured JSON error response

- Fix silent child-process failure in headless dispatch: add exit handler
  to write failed claim when sf headless triage exits non-zero

- Fix TypeError on malformed claim JSON: add Array.isArray guard before
  accessing claim.ids.length

All changes type-check cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:18:43 +02:00
Mikael Hugo
def1edefa9 sf snapshot: uncommitted changes after 268m inactivity 2026-05-15 02:08:06 +02:00
Mikael Hugo
2e4bdd292c fix: keep hidden sf commands callable in print mode 2026-05-14 21:25:18 +02:00
Mikael Hugo
f88b48b0aa fix: show print mode liveness 2026-05-14 20:59:19 +02:00
Mikael Hugo
487237a32c fix: bound sf print mode and chat routing 2026-05-14 20:55:00 +02:00
Mikael Hugo
47867c1236 feat: route clear sf chat commands 2026-05-14 20:21:37 +02:00
Mikael Hugo
ab1a1edcf9 refactor: tier sf slash commands 2026-05-14 20:14:09 +02:00
Mikael Hugo
7ea41b89ae feat(ai,coding-agent): wireModelId — provider deployment alias
Adds an optional wireModelId field to the Model interface and a
resolveWireModelId helper. Forge's canonical model.id stays stable for
selection, capability scoring, policy, and history; providers now send
model.wireModelId on the wire when set, model.id otherwise.

Use cases: Azure deployment names, vendor model slugs that differ
from Forge's canonical identity, A/B routing where the operator wants
canonical history but a specific deployment.

Wired through every provider in @singularity-forge/ai (anthropic,
amazon-bedrock, azure-openai-responses, google, google-vertex,
google-gemini-cli, mistral, openai-codex-responses, openai-completions,
openai-responses) plus @singularity-forge/coding-agent's
ModelRegistry (model definitions + per-model overrides).

Tests: openai-completions wireModelId payload coverage +
model-registry-auth-mode coverage for the override + definition fields.
Full pi-ai + coding-agent suite: 956/956 ✓ (7 unrelated skipped).

This realizes the model-registry contract drafted in 1d753af6b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:25:21 +02:00
Mikael Hugo
a342868068 feat(packages): extract @singularity-forge/openai-codex-provider
Mirrors the @singularity-forge/google-gemini-cli-provider package layout
for the codex CLI integration boundary. The new package owns:

- CodexAppServerClient (the JSON-RPC subprocess client; previously
  packages/ai/src/providers/codex-app-server-client.ts, no pi-ai
  internal coupling)
- snapshotCodexCliAccount / discoverCodexCliModels (reads
  ~/.codex/models_cache.json with visibility=list ∧ supported_in_api
  filter; previously inline in src/resources/extensions/sf/openai-codex-catalog.js)

openai-codex-responses.ts (the stream-shaping provider) intentionally
stays in @singularity-forge/ai because it depends on pi-ai stream-event
internals and is not reusable outside the provider — same scope as
google-gemini-cli.ts vs google-gemini-cli-provider.

The SF extension's openai-codex-catalog.js is now a thin SF-side cache
writer that delegates to discoverCodexCliModels, mirroring how
gemini-catalog.js delegates to discoverGeminiCliModels. readCodexAvailableModels
became async to match the dynamic-import path; tests updated.

Closes sf-mp4u5fcz-wh6ac9 (with documented AC2 narrowing — see
resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:48:19 +02:00
Mikael Hugo
f68ab20953 fix(ai): backfill MiniMax M2/M2.1 cacheRead pricing 2026-05-14 04:55:46 +02:00
Mikael Hugo
383e495085 feat(headless,gemini-cli): add sf headless usage + unify gemini quota path
Adds a machine-readable headless surface for live LLM-provider usage and
unifies the gemini-cli quota fetch through one helper, removing the
duplication that existed between usage-bar.js and the new package.

1. snapshotGeminiCliAccount in @singularity-forge/google-gemini-cli-provider

   - Single source of truth for { projectId, userTierId, userTierName,
     paidTier, models[] } via setupUser + retrieveUserQuota.
   - Dedups buckets per modelId, keeping the worst (lowest remainingFraction)
     so consumers always see the most-restrictive window. Code Assist
     sometimes returns multiple buckets per model; the pessimistic choice
     is what every consumer needs.
   - discoverGeminiCliModels(cwd?) wraps it for catalog-cache callers that
     only need the IDs.

2. sf headless usage subcommand

   - New src/headless-usage.ts handler. text (default) and --json output.
     Uses the package's snapshot directly — no RPC child, no jiti
     gymnastics — matching the shape of headless-uok-status / headless-doctor.
   - Wired into src/headless.ts after the doctor block.
   - Help text adds the command line.

3. usage-bar.js refactored to delegate

   - fetchGeminiUsage no longer imports gemini-cli-core directly. It calls
     snapshotGeminiCliAccount and reshapes the result into the existing
     { provider, displayName, windows[] } UI contract.
   - Eliminates the duplicate setupUser + retrieveUserQuota code path.
   - The fast existsSync(~/.gemini/oauth_creds.json) pre-flight stays
     so unauth'd users get a friendly message without paying for OAuth
     bootstrap.

4. Model registry refactor (separate track committed alongside)

   - src/resources/extensions/sf/model-registry.ts (new) consolidates
     canonical model identity, capability tier, and generation tags into
     one source of truth that auto-model-selection, benchmark-selector,
     and model-router now consume instead of maintaining parallel maps.

All 1487 tests pass (151 files); typecheck clean for both the package
and the SF extensions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:42:53 +02:00
Mikael Hugo
c6a3fa6a6a feat(gemini-cli): discover account models via gemini-cli-core + retry on capacity errors
Two related fixes for the google-gemini-cli provider, both motivated by today's
dogfood diagnosis: SF was pinned to a single model (gemini-3-flash-preview)
even though the AI Ultra account has access to seven (verified via the live
gemini-cli-core probe), and a transient "No capacity available for model X
on the server" was classified as `unknown` so SF gave up instead of retrying.

1. Account snapshot + model discovery in @singularity-forge/google-gemini-cli-provider

   - Add `snapshotGeminiCliAccount(cwd?)` returning { projectId, userTierId,
     userTierName, paidTier, models } where `models[]` carries each modelId
     with usedFraction, remainingFraction, and resetTime. Built on the same
     setupUser + CodeAssistServer.retrieveUserQuota path usage-bar.js
     already uses, but extracted to the dedicated package so any consumer
     (model picker, capacity diagnostics, catalog cache) can call one helper.
   - Add `discoverGeminiCliModels(cwd?)` as a thin "just the IDs" wrapper.
   - Both are best-effort: any failure (OAuth expired, no project, network)
     returns null silently — never throws.

2. SF-side cache writer at src/resources/extensions/sf/gemini-catalog.js

   - Delegates discovery to the package; only handles cache file path,
     6-hour TTL, and the session_start lifecycle hook.
   - Cache lands at .sf/runtime/model-catalog/google-gemini-cli.json with
     the same shape as the generic model-catalog-cache, so getKnownModelIds
     and the model picker pick it up transparently.
   - Wired into bootstrap/register-hooks.js session_start in parallel with
     the existing scheduleModelCatalogRefresh (the generic REST + API-key
     path can't reach gemini-cli's OAuth-only Code Assist endpoint).

3. Capacity error classification fix

   - error-classifier.js SERVER_RE now matches "no capacity (available|left)",
     "capacity (unavailable|exhausted)", and "no capacity ... on the server".
     Previously these fell through to kind=unknown, which is not transient,
     so agent-end-recovery never retried — even though the same handler
     already caps gemini-cli rate-limit backoff at 30s for exactly this
     class of transient. With the pattern matched as `server`, the existing
     retry-with-backoff path covers it.

The full extension test suite (1386 tests) passes. Typecheck clean for both
the package and the SF extensions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:32:35 +02:00
Mikael Hugo
65e195a9fd feat: Created draft mapping of SF patterns to ACE reference draft
SF-Task: S05/T01
2026-05-13 02:01:41 +02:00
Mikael Hugo
a49ea1da87 feat(sf/prompts): Phase 4 — cache_control breakpoints at static/dynamic boundary
Split reorderForCaching into a structured reorderAndSplitForCaching that
returns {before, after} at the semi-static→dynamic section boundary.

- prompt-ordering.js: export reorderAndSplitForCaching — returns null if no
  dynamic sections, otherwise {before: static+semi-static, after: dynamic}
- auto.js: import and wire reorderAndSplitForCaching into deps
- phases-unit.js: use split function; pass promptParts to runUnit when split
  succeeds; fall back to flat reorderForCaching when null
- run-unit.js: when promptParts is present, send a two-block content array
  [{type:text, text:before, cache_control:{type:ephemeral}}, {type:text, text:after}]
  so Anthropic-compatible providers cache the stable prefix
- openai-completions.ts: preserve cache_control on text parts in convertMessages;
  skip maybeAddOpenRouterAnthropicCacheControl if any part already has cache_control

Tests: 5 new contract tests for reorderAndSplitForCaching; all 4502 unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 01:36:22 +02:00
Mikael Hugo
ca7368e5f1 fix(bash): add 120s default timeout to prevent autonomous mode hangs
- Add BUILT_IN_DEFAULT_TIMEOUT_SECS = 120 constant to bash tool
- Compute effectiveTimeout = timeout ?? resolvedDefaultTimeout so LLM
  calls without a timeout get the 120s guard automatically
- Add defaultTimeoutSeconds? to BashToolOptions for override at creation
- Dynamic bashSchemaWithDefault describes the actual default in the LLM
  tool description, improving model awareness
- Add BashSettings interface + getBashDefaultTimeoutSeconds() to
  SettingsManager so users can override or disable via settings.json
- Wire defaultTimeoutSeconds into agent-session.ts _buildRuntime()

Root cause: npx sf --help triggered npm package download, hanging for
4+ minutes without timeout, consuming entire autonomous run budget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 19:12:33 +02:00
Mikael Hugo
338c75fc6f refactor: complete rf-01/rf-02/rf-11 blocked todos
rf-01: add ECONNREFUSED to isTransientNetworkError in anthropic-shared.ts,
  aligning with the NETWORK_RE pattern in error-classifier.js

rf-02: add scripts/validate-model-cost-table.mjs to report coverage gaps
  and price divergence between model-cost-table.js and models.generated.ts;
  add 'validate-cost-table' script to package.json

rf-11: extract 10 pure resource-display utility functions from
  interactive-mode.ts into packages/coding-agent/src/modes/interactive/
  resource-display.ts, reducing interactive-mode.ts by ~282 lines

All 4375 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 16:45:39 +02:00
Mikael Hugo
1adc7f119c refactor(rf-06): split auto/phases.js into per-phase modules
3538-line monolith → 6 focused modules + thin barrel:
- phases-helpers.js (223 lines): shared helpers (generateMilestoneReport,
  closeoutAndStop, emitCancelledUnitEnd, maybeFireProductAudit,
  _resolveReportBasePath, recordLearningOutcomeForUnit)
- phases-dispatch.js (486 lines): runDispatch + assessUokDiagnosticsDispatchGate
- phases-guards.js (497 lines): runGuards + guard helpers
- phases-pre-dispatch.js (760 lines): runPreDispatch
- phases-unit.js (1477 lines): runUnitPhase + session timeout state
- phases-finalize.js (542 lines): runFinalize
- phases.js (13 lines): barrel re-export preserving original import surface

Removed dead runPhaseReview export (zero callers confirmed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 15:14:49 +02:00
Mikael Hugo
64ddbd950f refactor(extensions): consolidate duplicate code into canonical modules
- Delete ghost package packages/pi-agent-core (no dist, no consumers,
  TS build errors; JS source sf-db.js had 3 commits not mirrored in TS)
- Remove build:pi-agent-core from root package.json build:pi pipeline
- Merge all models from MODEL_COST_PER_1K_INPUT into BUNDLED_COST_TABLE
  (model-cost-table.js is now the single canonical cost source)
- Remove duplicate MODEL_COST_PER_1K_INPUT object and getModelCost()
  from model-router.js; use lookupModelCost() from model-cost-table.js
- Replace hand-rolled isTransientNetworkError in preferences-models.js
  with delegation to classifyError() in error-classifier.js

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 08:28:49 +02:00
Mikael Hugo
0b5fa75c0d fix(lint): fix all pre-existing lint failures
- check-sf-extension-inventory.mjs: expand parseDirectRegisteredCommands()
  scan to include 7 more files (guards/inturn.js, notifications/notify.js,
  permissions/index.js, ui/usage-bar.js, commands/legacy/audit.js,
  commands/legacy/create-extension.js, commands/legacy/create-slash-command.js)
  and filter results by BASE_RUNTIME_COMMAND_NAMES to exclude doc-string false
  positives ("name" in create-slash-command.js template text)

- extension-manifest.json: remove 'clear' (subcommand of logs/notifications,
  never a top-level pi.registerCommand)

- packages/pi-agent-core/src/db/sf-db.ts: fix 23 noVoidTypeReturn errors
  - openDatabase: void → boolean (caller uses return value at line 5625)
  - claimEscalationOverride: void → boolean (caller checks at escalation.js:243)
  - resolveSelfFeedbackEntry: void → boolean (caller checks at self-feedback.js:387)
  - copyWorktreeDb: void → boolean (caller checks at reconcileWorktreeDb)
  - compactUokMessages: void → {before,after} (caller returns value at message-bus.js:238)
  - insertSessionTurn: void → bigint|null (caller uses id at session-recorder.js:104)
  - expireStaleMemories: void → number (caller uses count at auto-start.js:1047)
  - deleteMemorySourceRow: void → boolean (caller returns value at memory-source-store.js:107)
  - deleteMemoryEmbedding: void → boolean (caller returns value at memory-embeddings.js:328)
  - updateBacklogItemStatus: remove dead return expression (callers discard value)
  - removeBacklogItem: remove dead return expression (callers discard value)
  - updateGateCircuitBreaker: remove dead return {total,avgMs,...} (wrong-type
    code accidentally merged from getGateLatencyStats, never reachable)
  - markUokMessageRead: remove dead return true/false (callers discard value)

- Auto-fix formatting and organizeImports in ~30 source files (biome --write)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-11 04:02:31 +02:00
Mikael Hugo
2dea73398d fix(learning): add save_knowledge to manifest, failure_mode to aggregator SELECT + index
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 23:18:02 +02:00
Mikael Hugo
e50321b62b feat(selection): thread unitType + failure_mode into fallback outcome records
- FallbackResolver.setUnitContext() stores {unitType,unitId} from autonomous dispatch
- run-unit.js calls pi.setFallbackUnitContext() before/after each unit
- _findAnyAvailableFallback uses real unitType/unitId from context, not sentinel
- Schema v59: failure_mode column in llm_task_outcomes
- insertLlmTaskOutcome accepts failure_mode (rate_limit, quota_exhausted, auth_error)
- register-hooks.js passes event.classification.reason as failure_mode
- register-hooks.js uses real event.unitId when available
- ExtensionRuntimeActions.setFallbackUnitContext added to pi API surface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 23:14:22 +02:00
Mikael Hugo
009651e86f feat(selection): wire before_model_select into FallbackResolver for outcome-aware fallback
When a model fails and FallbackResolver picks a replacement, it now:
1. Fires the before_model_select hook with reason='fallback' and the
   failing model's ID — the learning system records the failure outcome
   and returns the best Bayesian-blended replacement from llm_task_outcomes
2. Falls back to the existing heuristic sort (reasoning + context window)
   if the hook is unavailable or returns no override

Changes:
- BeforeModelSelectEvent: add optional currentModelId and reason fields
- FallbackResolver: accept emitBeforeModelSelect in constructor; make
  _findAnyAvailableFallback async; fire hook before heuristic fallback
- agent-session.ts: inject lazy emitBeforeModelSelect closure into resolver
- register-hooks.js: record failure outcome when reason='fallback' before
  returning selectLearnedModel result

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 23:05:33 +02:00
Mikael Hugo
fb1bd3e5fa refactor(shared): deduplicate shared/ utilities against coding-agent package exports
- Add packages/coding-agent/src/utils/format.ts as the canonical source
  for formatDuration, formatTokenCount, truncateWithEllipsis, sparkline,
  formatDateShort, fileLink, stripAnsi, normalizeStringArray — all already
  exported from @singularity-forge/coding-agent via index.ts.

- Convert shared/format-utils.js to a compatibility shim that re-exports
  the 8 functions from @singularity-forge/coding-agent. All 13 importers
  continue to work with no import changes required.

- Convert shared/path-display.js to a compatibility shim that re-exports
  toPosixPath from @singularity-forge/coding-agent. Implementation in
  packages/coding-agent/src/utils/path-display.ts was already canonical.

- shared/frontmatter.js is intentionally NOT shimmed: splitFrontmatter/
  parseFrontmatterMap have a different API from the package's parseFrontmatter/
  stripFrontmatter (flat-map vs {frontmatter, body} object).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-10 22:41:03 +02:00