Orphaned sift warmups can spin past --retriever-timeout-ms (a per-page
timeout, not wall-clock) and burn CPU indefinitely after the launcher
exits — observed a 95-min, 98% CPU orphan. Wrap the detached spawn in
timeout(1) / gtimeout when present (SIGTERM at the cap, SIGKILL 10s
later); fall back to raw spawn elsewhere. Default cap 1800s, override
via SF_SIFT_HARD_TIMEOUT_SEC, disable via SF_SIFT_HARD_TIMEOUT_DISABLE=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- engines.node: >=24.15.0 across all 23 package.json (root + 8
workspace + studio + web + pkg + vscode-extension + 11 SF
extension manifests)
- CI workflows pinned to node-version: '24.15' (16 sites)
- Dockerfile -> node:24.15-slim
- .nvmrc / .node-version -> 24.15.0
- Refactored worktree-cli.ts and headless-query.ts to use
import.meta.filename instead of fileURLToPath(import.meta.url)
- exec.ts simplified with AbortSignal.any + spawn signal/killSignal
- Picks up Crush's biome.json + AGENTS.md doc cleanup in same pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Since Node >= 24 is the minimum engine, remove the better-sqlite3 fallback
chain from sf-db.ts, unit-ownership.ts, and cli-stats.ts. Use DatabaseSync
from node:sqlite directly. Also replace the `glob` npm package with built-in
node:fs/promises.glob and node:fs.globSync in pi-coding-agent LSP utils.
- Remove createRequire boilerplate and suppressSqliteWarning helper
- Simplify loadProvider() and openRawDb()
- Net -177 lines of fallback/middleware code
💘 Generated with Crush
Assisted-by: GLM-5.1 via Crush <crush@charm.land>
- Wrap bare test blocks in describe/it for vitest compatibility
- Clean up vitest.config.ts
💘 Generated with Crush
Assisted-by: GLM-5.1 via Crush <crush@charm.land>
- Convert remaining node:test → vitest imports in packages/* and studio/*
- Fix mock.callCount() → mock.callCount property access for vitest compat
- Fix mock.calls[N].arguments → mock.calls[N] for vitest compat
- Update tsconfig.extensions.json to exclude test files from tsc
- Harden migrate-to-vitest-all.mjs regex for single quotes and optional semicolons
- Add behavioural tests for isProviderAllowedForAdvisor wired into
selectAndApplyModel for subagent unit types.
- Verify non-subagent units are unaffected by the advisor allowlist.
- Add static source analysis guard confirming the check exists.
Assisted-by: Kimi Code CLI
Add vitest.config.ts with forks pool, v8 coverage, and package aliases.
Run migrate-to-vitest.mjs to replace `from "node:test"` imports with
`from 'vitest'` across 749 test files, converting mock.fn→vi.fn and
mock.timers→vi fake timers where needed.
💘 Generated with Crush
Assisted-by: GLM-5.1 via Crush <crush@charm.land>
- Move guards phase after dispatch in dev path so unitType/unitId are
available for plan-gate validation
- Relocate UOK plan-gate from runDispatch into runGuards with
getSliceTaskCounts first-task-of-slice check
- Rename runLegacyAutoLoop → autoLoop in startAuto call sites
- Add plan quality gate in _deriveStateImpl via getSlicePlanBlockingIssue
- Clear path cache in invalidateStateCache
- Deprioritise minimax in search provider fallback ordering
- Fix native-search Anthropic heuristic to exclude copilot/minimax/kimi
clones while still matching claude-* models
- Add releaseIfIdle to CodexAppServerClient for clean short-lived process
exit
- Fix nested codex error message parsing
- Update search provider tests to clear minimax env vars
- Add native parser zero-task fallback in parsePlan
💘 Generated with Crush
Assisted-by: GLM-5.1 via Crush <crush@charm.land>
- Add codex-app-server-client for Codex app server communication
- Update openai-codex-responses provider integration
- Fix auto.ts to use runLegacyAutoLoop post-UOK-refactor
- Add advisor_allowed_providers preference support
- Fix slice plan blocking issue check in auto-recovery
- run-unit.ts: do NOT clear isSessionSwitchInFlight on timeout; let the
dangling newSession .finally() clear it via generation check. This fixes
'runUnit keeps the session-switch guard across a late newSession settlement'.
- auto.ts: use `runLegacyLoop: autoLoop` (not runLegacyAutoLoop) — autoLoop
already defaults to legacy-direct dispatch contract. Fixes source-inspection
test that expects the literal text 'runLegacyLoop: autoLoop'.
- state.ts: remove over-strict plan quality check from state derivation so
minimal plans (no review sections) don't block task dispatch.
- auto-recovery.ts, auto-timers.ts: minor cleanup from agent sweep.
- packages/pi-ai: github-copilot.ts OAuth helper + index.ts export wiring.
- openai-codex.ts: drop stale PKCE residuals after simplification.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Dispatch Pattern subsection showing the parentTrace shape for
advisory review. For advisory, the trace is the planner's reasoning trail
(alternatives considered, untested assumptions, explicit out-of-scope) —
not tool calls. This lets the advisory reviewer catch the gap between
what the planner thought and what the artefact says, which is exactly
what advisory review exists to catch.
Closes the loop on parent-trace pass-through (subagent dispatch wiring +
helper + test were landed earlier). The dispatch tool supports parentTrace
at TaskItem / ChainItem / batch level; until the canonical review skills
teach the LLM to PASS it, the feature is dead code in practice.
- code-review/SKILL.md Phase 2: shows the 5-lens parallel review swarm
dispatch with parentTrace at the batch level. Reviewer can audit what the
implementer actually did, not just the prose summary.
- requesting-code-review/SKILL.md Local Review Loop: shows the
advocate + challenger-A + challenger-B dispatch with parentTrace and
adds a hard rule that all three must receive it. Specifically calls out
that the advocate is the most likely to wave away an objection the
trace contradicts — passing the trace forces engagement.
- prompts/validate-milestone.md Step 1: passes a slice-claim summary
(one bullet per slice, with SUMMARY path) as parentTrace to the three
validation reviewers, so they audit slice claims against artifacts.
PDD packet (inline; pure prose docs, no code change):
- Purpose: review skills actually USE the parentTrace plumbing instead of
dispatching reviewers blind to what the parent did.
- Consumer: code-review (every slice/PR review), requesting-code-review
(every external review request), validate-milestone (every milestone close).
- Contract: each skill's dispatch example includes parentTrace; the rule
text instructs the LLM to assemble its own tool-call summary.
- Evidence: grep confirms `parentTrace` in all three files; npm run
copy-resources propagated to dist; typecheck:extensions exits 0.
- Non-goals: not changing the verifier prompt assembly (already inherits
from composeTaskWithParentTrace's embedded instructions); not changing
agent definitions; not auto-capturing the trace (parent agent decides
what's relevant).
- Invariants: existing dispatch examples preserved with parentTrace added,
not replacing the original; no agent type changes.
- Assumptions: the parent LLM's context contains the tool-call history it
needs to assemble parentTrace; the dispatch tool routes the field
through unchanged (verified by parent-trace.test.ts).
Follows up the parent-trace dispatch wiring (bundled into bc9cf4fef +
2508822b8). Adds:
- src/resources/extensions/subagent/tests/parent-trace.test.ts — 7 cases
covering the composeTaskWithParentTrace helper: undefined/empty/whitespace
pass-through, tag wrapping, task-after-trace ordering, content trimming,
embedded verifier instructions ("hedge words", "tool errors").
- src/resources/extensions/subagent/index.ts — exports composeTaskWithParentTrace
so the test can import it.
- skills/dispatching-subagents — new "Parent trace (for verifier/review
subagents)" subsection documents the field at TaskItem / ChainItem /
batch level, the per-task override, and the chain (step 0 only) and
debate (round 1 only) behaviour.
PDD packet (inline; small follow-up to the architectural change):
- Purpose: parent-trace plumbing has a falsifiable test and is documented in
the canonical dispatching-subagents skill so callers know how to use it.
- Consumer: the dispatching-subagents skill (loaded by every agent that
calls the subagent tool); the test (covers regression).
- Contract: 7 test cases pass; SKILL.md contains the documented field at
three schema levels with the override and per-mode behaviour notes.
- Evidence:
- tests/parent-trace.test.ts → 7/7 pass via the SF resolve-ts loader
- npm run typecheck:extensions exits 0
- All 35 subagent suite tests pass
- Non-goals: not changing the dispatch wiring (already in); not adding
parent-trace handling to background jobs (separate slice if needed).
- Invariants (safety only — sync helper + pure prose docs):
- composeTaskWithParentTrace returns task unchanged when trace is empty.
- The original task always appears after the closing tag.
- Trimmed content is what gets injected, not the raw padded input.
- Assumptions: tests load TS via the resolve-ts.mjs hook (standard SF
pattern); skills load SKILL.md from dist via copy-resources.
- openai-codex.ts: replace hand-rolled PKCE flow with simple read of
~/.codex/auth.json written by the real codex CLI after user authentication.
Removes ~250 lines of local callback server + browser dance code.
- openai-codex-responses.ts: minor residual cleanup
- openai-completions.ts: drop remaining `as any` stream_options cast
- anthropic-shared.ts: use `unknown` cast on thinkingNoBudget path
- pi-coding-agent/extensions/types.ts: minor type addition
- db-tools.ts: explicit AgentToolResult return type on execute handlers
- requesting-code-review/SKILL.md: prompt wording cleanup
- subagent/index.ts: capability registration wiring
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- anthropic-shared.ts: replace `as any` cast on thinkingNoBudget path with
`as unknown as Record<string, unknown>` for auditability; remove `as any`
on server_tool_use block (SDK type is now correct)
- openai-completions.ts: drop residual `as any` casts after SDK type update
- db-tools.ts: add explicit AgentToolResult return type annotation on execute
handlers to resolve implicit-any lint
- requesting-code-review/SKILL.md: update review skill prompt
- subagent/index.ts: wire subagent capability registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- package.json: add 'typecheck' script (build:pi + tsc --noEmit) so pi-ai
and pi-coding-agent typecheck under the same command surface SF uses.
- anthropic-shared.ts: replace 'as any' casts with proper Anthropic SDK
types (ServerToolUseBlockParam, WebSearchToolResultBlockParam,
CacheControlEphemeral). The cache_control variant is documented inline
so the cast is auditable.
- openai-completions.ts: drop the 'as any' on stream_options — the type
system can verify the assignment now.
- openai-codex-responses.ts, package-manager.ts, skills.ts: annotate the
three remaining empty catches with one-line WHY comments (best-effort
cleanup, malformed ignore files, partial directory traversal). Empty
catch with no rationale is an SF012 anti-pattern; with rationale it is
a deliberate fallback.
- oauth/github-copilot.ts, oauth/openai-codex.ts: add UPSTREAM AUDIT
blocks documenting why these hand-rolled OAuth flows stay hand-rolled
rather than delegating to @octokit/auth-oauth-device or @openai/codex.
AbortSignal coverage and provider-specific surface area are the gating
concerns; re-audit triggers are named.
Two small defensive fixes in the auto-loop that surfaced when running
sf in degraded environments (no .sf/sf.db yet, or unset basePath):
- phases.ts: gate planning-flow gate behind isDbAvailable() so a missing or
not-yet-initialized DB does not throw inside the gate runner.
- run-unit.ts: skip process.chdir when s.basePath is falsy. The original
guard compared cwd to an empty string, which always failed on the first
unit of a fresh runtime root.
Both are conservative — preserve existing behaviour when DB and basePath
are present.
Tail-end of the PDD v2 work (Assumptions field + safety/liveness split +
machine-executable Evidence). Three documents that still referenced v1's
4-field Purpose Gate are updated to the full 8-field PDD packet:
- docs/SPEC_FIRST_TDD.md — Purpose Gate now lists all 8 fields with the
Assumptions and Failure-boundary additions inline.
- skills/requesting-code-review — replaces "Purpose & Consumer" section with
"PDD packet (all 8 fields)" restated verbatim from .sf/active/{unit-id}/pdd.md.
Falsifier and Scope-defence sections clarified vs Failure-boundary and
Non-goals to remove overlap.
- skills/receiving-code-review — Purpose Gate criterion updated to demand
the full PDD packet with machine-executable Evidence, not just
Purpose/Consumer/Value-at-risk.
PDD packet (inline):
- Purpose: every artefact that references "Purpose Gate" agrees on the same
8-field definition; reviewers and reviewees read the same packet.
- Consumer: spec-first-tdd, requesting-code-review, receiving-code-review.
- Contract: all three documents list the same 8 fields with the same
Assumptions / safety+liveness / machine-executable-Evidence wording.
- Evidence: grep confirms PDD packet references in all three; typecheck:extensions exits 0.
- Non-goals: no edits to the PDD skill itself (already v2); no edits to other
skills referencing v1 Purpose Gate beyond these three (they don't exist).
- Invariants: existing review-loop sections preserved; only Purpose-Gate-
related sections rewritten.
- Assumptions: PDD v2 SKILL.md is the canonical source of field definitions;
these three documents are projections of it.
Step 2 + scan-and-improve from the Piebald-AI/claude-code-system-prompts pattern
analysis. Five files, prose-only edits, no code changes.
- prompts/gate-evaluate.md — Verdict Discipline section: omitted is not a hedge.
Each omitted verdict needs a reason; unexplained omitted is treated as
failed-to-decide and re-dispatched.
- skills/dispatching-subagents — Subagent Prompt Audit: before dispatch, audit
for smuggled user-questions, action-class delegation, scope creep, and tool
vs prompt mismatch. After return, scan for hedge words, glossed-over tool
errors, and self-reports without traces.
- skills/researcher — Read-only discipline block: closes the bash redirect /
heredoc back-door. Researcher does not write files, DB rows, git, or
packages; the report is the only output, and write-requires findings are
surfaced for parent dispatch rather than performed in-skill.
- skills/systematic-debugging — Recognize Your Own Rationalizations: names
the debugging-specific failure modes ("error message obviously says X",
"small diff can't be the cause", "test was probably flaky"). Adds Command/
Output trace format requirement to Phase 4 verification.
- skills/spec-first-tdd — Adds Command/Output trace format requirement to the
Evidence section.
PDD packet (inline; prose-only edit, all five additions):
- Purpose: harden five SF skills/prompts so loaded text catches rationalizations,
closes the read-only back-door, and requires falsifiable verdicts/traces.
- Consumer: every gate evaluation, subagent dispatch, research run,
debugging session, and TDD slice.
- Contract: SKILL/prompt text contains the new sections at predictable
anchor points, grep-able by the section headings used.
- Evidence: grep-confirmed presence of "Verdict Discipline", "Subagent Prompt
Audit", "<read_only_discipline>", "Recognize Your Own Rationalizations",
"Trace format" in their respective files; typecheck:extensions exits 0;
copy-resources propagated to dist.
- Non-goals: no edits to ask-gate.ts, no transport changes (parent-transcript
pass-through deferred); no edits to receiving-code-review/requesting-code-
review (already strong post-PDD-v2).
- Invariants: existing sections preserved; only additions; frontmatter
unchanged.
- Assumptions: skills loaded from dist via copy-resources; section text is
injected verbatim into agent context; SF voice (paraphrased patterns, not
copy-pasted from Anthropic's bytes).
Adds three patterns from Piebald-AI/claude-code-system-prompts (extracted from
the public Claude Code npm bundle) to SF's two completion-gate skills:
- "You are bad at this" self-awareness sections at the top of finish-and-verify
and code-review — names the LLM-specific failure modes (read-don't-run,
trust-self-reports, hedge-when-uncertain, fooled-by-AI-slop) instead of the
generic "be thorough" framing.
- Rationalization-callouts that name the exact excuses the agent reaches for
("probably fine", "tests already pass", "looks correct based on my reading")
and invert each with a counter-instruction.
- Mandatory adversarial probe before slice-done / Lens 1 APPROVE: at least one
boundary / idempotency / concurrency / orphan-reference probe with documented
result, even when behaviour was correct.
- Command/Output/Result trace format for verification evidence — paraphrase is
not evidence; a check without a Command-run block is a skip.
- Anti-hedge guard on code-review verdicts: APPROVE_WITH_FIXES is not for "I'm
not sure"; findings without traces drop to Medium.
PDD packet (inline since prose-only edit, no code):
- Purpose: when these skills load, the agent reads its own failure-mode catalogue
- Consumer: every slice close (finish-and-verify) and every review (code-review)
- Contract: SKILL.md text contains rationalizations + adversarial probe + trace format
- Evidence: grep finds ≥3 keyword matches per file; typecheck:extensions exits 0; dist parity
- Non-goals: no edits to gate-evaluate.md, dispatching-subagents, ask-gate.ts (deferred)