feat(bootstrap): pre-warm Sift index at session_start
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions

Sift (~/.cargo/bin/sift) builds its index lazily on first `sift
search` per cache key. In an SF session, the first real Sift query
typically happens deep inside an execute-task unit when an agent
reaches for the search-tool — and that agent pays the full cold-
build cost (tens of seconds on a large repo). Subsequent queries
hit warm cache and are fast.

Hook session_start to fire a cheap detached `sift search` against
the project root. The actual index build runs in parallel with the
rest of session_start (other catalog refreshes, doctor fix, etc.)
and is ready by the time any agent invokes search-tool. Cheapest
possible warmup: bm25-only retriever, no reranking, limit 1 — just
enough to trigger the index build pipeline.

Fully fire-and-forget: failures are swallowed (sift missing, spawn
error, exit non-zero — all just resolve(false). SF carries on as
before).

Also lands the .sf/preferences.yaml git section requested in the
same session: solo-mode defaults (auto_push=true, isolation=none,
merge_strategy=squash) so the autonomous loop doesn't pause for
operator confirmation on commit/push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-17 01:24:51 +02:00
parent 53259aebf1
commit 38994d7a20
3 changed files with 106 additions and 0 deletions

View file

@ -19,3 +19,11 @@ custom_instructions: []
models: {}
skill_discovery: {}
auto_supervisor: {}
# Solo-mode git defaults: sf commits + pushes without operator confirmation
# during autonomous mode. Matches MODE_DEFAULTS.solo from preferences-types.js.
git:
auto_push: true
push_branches: false
pre_merge_check: auto
merge_strategy: squash
isolation: none

View file

@ -538,6 +538,18 @@ export function registerHooks(pi, ecosystemHandlers = []) {
} catch {
/* non-fatal — codex catalog refresh must never block session start */
}
// Pre-warm the Sift search index so the first agent query in this
// session doesn't pay the cold-build cost. Sift indexes lazily on
// first `sift search` invocation per cache key. Fires a cheap
// detached search against the project root; the actual index build
// runs in parallel with the rest of session_start and is ready by
// the time an agent reaches for the search-tool.
try {
const { prewarmSiftIndex } = await import("../sift-prewarm.js");
prewarmSiftIndex(process.cwd()).catch(() => {});
} catch {
/* non-fatal — sift prewarm must never block session start */
}
// Audit benchmark coverage — compare the dispatchable model set
// (catalog ∩ user policy) against the static benchmark file and write
// ~/.sf/benchmark-coverage.json. Surfaces models routed via /v1/models

View file

@ -0,0 +1,86 @@
/**
* sift-prewarm.js fire-and-forget Sift index warmup.
*
* Purpose: Sift (the Rust search binary at ~/.cargo/bin/sift) builds its
* index lazily on first `sift search` invocation per cache key. In an
* SF session the first real Sift query which usually happens deep
* inside an execute-task unit when an agent needs to look up a symbol
* or pattern pays the full cold-index build cost (can be tens of
* seconds on a large repo). Subsequent queries hit a warm cache and
* are fast.
*
* This module fires an inexpensive `sift search` against the project
* root at SF session_start, fully detached and stdio-ignored, so the
* index build happens in parallel with the rest of session startup.
* By the time an agent actually needs Sift, the index is warm.
*
* Cheapest possible search:
* - --retrievers bm25 (no embedding model load, no reranker)
* - --reranking none
* - --limit 1 (don't waste cycles materializing results)
*
* Failures are silently swallowed: if `sift` isn't installed, the
* binary errors, or the spawn fails, SF carries on as before Sift
* tooling already handles "sift unavailable" gracefully elsewhere.
*
* Consumer: bootstrap/register-hooks.js session_start handler, plus
* the autonomous loop's periodic re-warm hook (TBD if added).
*/
import { spawn } from "node:child_process";
/**
* Spawn a Sift warmup search against basePath. Detached + stdio-ignored
* so it does not hold the parent SF process. Returns a Promise that
* resolves on spawn-error or process-exit; never rejects.
*
* @param {string} basePath repo root to warm
* @returns {Promise<{started: boolean, reason?: string}>}
*/
export function prewarmSiftIndex(basePath) {
return new Promise((resolve) => {
let proc;
try {
proc = spawn(
"sift",
[
"search",
"--json",
"--limit",
"1",
"--retrievers",
"bm25",
"--reranking",
"none",
"--retriever-timeout-ms",
"60000",
basePath,
"sf-prewarm-index",
],
{
cwd: basePath,
stdio: "ignore",
detached: true,
},
);
} catch (err) {
resolve({
started: false,
reason:
err && typeof err === "object" && "message" in err
? String(err.message)
: String(err),
});
return;
}
// Detach so SF process can exit without waiting on the warmup.
try {
proc.unref();
} catch {
// best-effort
}
proc.on("error", (err) =>
resolve({ started: false, reason: String(err.message ?? err) }),
);
proc.on("exit", () => resolve({ started: true }));
});
}