sf snapshot: uncommitted changes after 56m inactivity

This commit is contained in:
Mikael Hugo 2026-05-16 14:59:40 +02:00
parent 6071a9207c
commit da0c41d375
8 changed files with 188 additions and 74 deletions

View file

@ -136,6 +136,39 @@ This file is the explicit capability and coverage contract for the project.
- Validation: unmapped
- Notes: Requires (1) new prompt template `prompts/fill-milestone-vision.md`, (2) new dispatchable unit wired in `auto-dispatch.js` + `state-transition-matrix.js`, (3) an exception in `buildRegistryAndFindActive` for one-shot `status=complete && vision=""` repair, (4) inline-fixer handler that converts the R011 self-feedback entry into a dispatch. Must satisfy R006 (fail-open) — recovery-unit failure halts with notification, never crashes the loop.
### R013 — Unified Dispatch v2: `inline` Scope for `full` Isolation
- Class: core-capability
- Status: active
- Description: Implement the `inline` scope row of `UNIFIED_DISPATCH_V2_PLAN.md`'s parameter matrix (line 152: `full | managed | inline | single`) so the autonomous loop can execute units in-process without spawning a subprocess/worktree. A new `src/resources/extensions/sf/dispatch-layer.js` exposes `DispatchLayer.dispatch(opts)` per the plan's API spec (lines 51-138). When `scope: 'inline'` and `isolation: 'full'`, the unit's executor runs in the calling process against the project DB directly — no `child_process.spawn`, no session-status-io files, no worktree.
- Why it matters: The current spawn-based path silently fails on `validate-milestone` and likely other unit types (self-feedback `sf-mp8bhp5s-cmgt8d`, critical, blocking) — worker session IDs are issued and tracked in `.sf/runtime/units/*.json` but the worker never writes its session JSONL and `recoveryAttempts` stays at 0 across runaway-final-warning phases. Universal across providers (kimi-k2.6 and minimax both produce 0 tool calls with heartbeats only). Adding an inline path naturally retires this whole class of bug for units that don't need worktree isolation. Also reduces process-start latency and removes the file-based-IPC pressure point that has accumulated multiple historical issues.
- Source: spec
- Primary owning slice: unmapped
- Supporting slices: none
- Validation: unmapped
- Notes: Aligned with `docs/plans/UNIFIED_DISPATCH_V2_PLAN.md` (Qwen Plan, 2026-05-08). Scope of R013 is the **minimum slice** of that plan: just `full + managed + inline + single`. Other rows of the matrix (parallel/debate/chain inline, slice/milestone scope with worktrees) are out of scope for R013 and stay on their current implementations. Resolves `sf-mp8bhp5s-cmgt8d` and likely the 56+ historical `runaway-loop:idle-halt` entries on M005.
### R014 — Inline Worker Bootstrap Without Spawned `sf` CLI
- Class: core-capability
- Status: active
- Description: Extract the unit-execution code path that `sf headless autonomous` currently invokes after spawn into a callable function (`runUnitInline(unitType, unitId, ctx)`) usable from the same process. UOK kernel calls it directly when dispatching with `scope: 'inline'`. Must respect the single-writer invariant on `.sf/sf.db` (`sf-db.js`); the in-process call shares the kernel's existing WAL connection rather than opening a new one.
- Why it matters: Today the unit executor is reachable only via subprocess argv parsing in the headless CLI surface. Without this extraction, R013's inline scope cannot wire a real executor — the dispatcher would have nothing to call. This is the prerequisite for R013.
- Source: spec
- Primary owning slice: unmapped
- Supporting slices: none
- Validation: unmapped
- Notes: Reuses existing unit-context-manifest, prompt builders, and tool registries. The only change is execution surface: function call instead of process boundary. Session JSONL is still written for audit but to a path keyed off the in-process session ID, not a worker subprocess.
### R015 — Spawn-Failure Loud Failure (Defensive)
- Class: failure-visibility
- Status: active
- Description: Until R013/R014 land for every unit type, the existing spawn path must fail loudly. If a dispatched worker fails to write its session JSONL within a configurable timeout (default 30s) AND has zero `progressCount`, the runtime must (a) transition the unit to `status: failed`, (b) capture any stderr from the spawn into `lineage.events`, (c) emit a doctor-visible signal, and (d) trigger the retry path up to `maxRetries`. Today the runaway watchdog only fires a warning and never retries — `recoveryAttempts` stays at 0.
- Why it matters: Even after inline scope retires the spawn path for the common cases, spawn-based dispatch will persist for milestone/slice-scope workers and parallel modes. Silent failure is the worst possible behavior — operator sees a "running" unit that's a ghost. This requirement keeps the spawn path observable for as long as it exists.
- Source: spec
- Primary owning slice: unmapped
- Supporting slices: none
- Validation: unmapped
- Notes: Touches the runaway-recovery / unit-ownership / parallel-orchestrator surfaces. Distinct from R013 — R013 removes the bug for inline scope; R015 contains the bug for non-inline scope.
## Traceability
| ID | Class | Status | Primary owner | Supporting | Proof |
@ -152,10 +185,13 @@ This file is the explicit capability and coverage contract for the project.
| R010 | quality-attribute | active | M005/S02 | none | unmapped |
| R011 | failure-visibility | active | unmapped | none | unmapped |
| R012 | differentiator | active | unmapped | none | unmapped |
| R013 | core-capability | active | unmapped | none | unmapped |
| R014 | core-capability | active | unmapped | none | unmapped |
| R015 | failure-visibility | active | unmapped | none | unmapped |
## Coverage Summary
- Active requirements: 12
- Active requirements: 15
- Mapped to slices: 10
- Validated: 0
- Unmapped active requirements: 2 (R011, R012 — pending planning into a new self-heal extension slice or M003 follow-on)
- Unmapped active requirements: 5 (R011, R012 — self-heal extension; R013, R014, R015 — UNIFIED_DISPATCH_V2 inline scope, anchored to docs/plans/UNIFIED_DISPATCH_V2_PLAN.md)

View file

@ -1086,15 +1086,17 @@ export class SessionManager {
_persist(entry: SessionEntry): void {
if (!this.persist || !this.sessionFile) return;
const hasAssistant = this.fileEntries.some(
(e) => e.type === "message" && e.message.role === "assistant",
);
if (!hasAssistant) {
// Mark as not flushed so when assistant arrives, all entries get written
this.flushed = false;
return;
}
// #R015-remediation (sf-mp8c0arc-vgw8io): previously this method
// deferred file creation until the first assistant message arrived
// (silent return on !hasAssistant). The intent was to avoid empty
// files for cancelled/never-started sessions, but the cost was
// silent invisibility when the LLM never produced an assistant
// message — failed sessions left zero forensic trail and the SF
// autonomous loop's watchdog couldn't tell a live session from a
// dead one. The eventual cost (debugging M005's chronic stuck
// state) far exceeded the saved disk space. We now write entries
// as soon as they're added, so the session JSONL exists with at
// least the session header + user prompt from the very first turn.
let release: (() => void) | undefined;
try {
release = tryAcquireLockSync(this.sessionFile);

View file

@ -1150,10 +1150,9 @@ export async function buildDiscussProjectPrompt(
});
const parts = [];
if (composed) parts.push(composed);
const knowledgeBlockDP = await inlineKnowledgeScoped(base, []);
if (knowledgeBlockDP) parts.push(knowledgeBlockDP);
const graphBlockDP = await inlineGraphSubgraph(base, "project setup", { budget: 3000 });
if (graphBlockDP) parts.push(graphBlockDP);
// #M005-remediation: knowledge/graph are computed artifacts already
// included in `composed` via the computed registry above. Manual
// re-injection here caused duplicate sections in the prompt output.
const inlinedContext = capPreamble(
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
);
@ -1208,12 +1207,7 @@ export async function buildDiscussRequirementsPrompt(
});
const parts = [];
if (composed) parts.push(composed);
const knowledgeBlockDR = await inlineKnowledgeScoped(base, []);
if (knowledgeBlockDR) parts.push(knowledgeBlockDR);
const graphBlockDR = await inlineGraphSubgraph(base, "project requirements", {
budget: 3000,
});
if (graphBlockDR) parts.push(graphBlockDR);
// #M005-remediation: knowledge/graph included via composed (computed registry).
const inlinedContext = capPreamble(
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
);
@ -1276,12 +1270,7 @@ export async function buildResearchProjectPrompt(
});
const parts = [];
if (composed) parts.push(composed);
const knowledgeBlockRP = await inlineKnowledgeScoped(base, []);
if (knowledgeBlockRP) parts.push(knowledgeBlockRP);
const graphBlockRP = await inlineGraphSubgraph(base, "project research", {
budget: 3000,
});
if (graphBlockRP) parts.push(graphBlockRP);
// #M005-remediation: knowledge/graph included via composed (computed registry).
const inlinedContext = capPreamble(
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
);
@ -1346,12 +1335,7 @@ export async function buildDiscussMilestonePrompt(
});
const parts = [];
if (composed) parts.push(composed);
const knowledgeBlockDM = await inlineKnowledgeScoped(base, []);
if (knowledgeBlockDM) parts.push(knowledgeBlockDM);
const graphBlockDM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, {
budget: 3000,
});
if (graphBlockDM) parts.push(graphBlockDM);
// #M005-remediation: knowledge/graph included via composed (computed registry).
const inlinedContext = capPreamble(
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
);
@ -1373,11 +1357,12 @@ export async function buildDiscussMilestonePrompt(
return basePrompt;
}
export async function buildResearchMilestonePrompt(mid, midTitle, base) {
// #4782 phase 3: research-milestone migrated through the composer.
// Declared inline order: milestone-context, project, requirements,
// decisions, templates. Knowledge stays outside the composer
// (budget-driven, scoped by keyword extraction — future phase folds
// policy-driven blocks in).
// #M005-remediation: research-milestone now fully delegates ordering
// to the v2 composer. The manifest declares knowledge as an inline
// artifact (positioned between decisions and templates) so its
// keyword-budgeted resolver runs in the correct slot. Graph is a
// computed artifact appended after templates. Eliminates the manual
// splice + duplicate-injection pattern previously inlined here.
const resolveArtifact = async (key) => {
switch (key) {
case "milestone-context": {
@ -1391,6 +1376,8 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) {
return await inlineRequirementsFromDb(base, mid);
case "decisions":
return await inlineDecisionsFromDb(base, mid);
case "knowledge":
return await inlineKnowledgeBudgeted(base, extractKeywords(midTitle));
case "templates":
return inlineTemplate("research", "Research");
default:
@ -1398,37 +1385,18 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) {
}
};
const { inline: composed } = await composeUnitContext("research-milestone", {
resolveArtifact,
});
// Knowledge block stays outside the composer — budgeted, scoped via
// keyword extraction (#4719). Inserted between decisions and the
// templates block to match the pre-migration output order. We split
// the composer output around the templates section to preserve that
// ordering.
const knowledgeInlineRM = await inlineKnowledgeBudgeted(
base,
extractKeywords(midTitle),
);
const graphBlockRM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, {
budget: 3000,
resolveArtifact,
computed: {
graph: {
build: async (_, b) =>
inlineGraphSubgraph(b, `${mid} ${midTitle}`, { budget: 3000 }),
inputs: {},
},
},
});
const parts = [];
if (knowledgeInlineRM && composed) {
// Insert knowledge before the template block so the overall order is:
// milestone-context → project → requirements → decisions → KNOWLEDGE → research template
const idx = composed.lastIndexOf("### Output Template:");
if (idx > 0) {
const before = composed.slice(0, idx).replace(/\n\n---\n\n$/, "");
const after = composed.slice(idx);
parts.push(before, knowledgeInlineRM, after);
} else {
parts.push(composed, knowledgeInlineRM);
}
} else if (composed) {
parts.push(composed);
if (knowledgeInlineRM) parts.push(knowledgeInlineRM);
}
if (graphBlockRM) parts.push(graphBlockRM);
if (composed) parts.push(composed);
const inlinedContext = capPreamble(
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
);

View file

@ -271,6 +271,51 @@ export function startUnitSupervision(sctx) {
);
return;
}
if (decision.action === "fail") {
if (getInFlightToolCount() > 0) return;
await closeoutUnit(
ctx,
s.basePath,
s.currentUnit.type,
s.currentUnit.id,
s.currentUnit.startedAt,
buildSnapshotOpts(),
);
writeUnitRuntimeRecord(
s.basePath,
unitType,
unitId,
s.currentUnit.startedAt,
{
phase: "failed-silent-worker",
status: "failed",
lastProgressAt: Date.now(),
lastProgressKind: "runaway-guard-fail",
runawayGuardFail: decision.metadata,
},
);
const unitParts = unitId.split("/");
recordSelfFeedback(
{
kind: "runaway-loop:silent-worker-failure",
severity: "high",
summary: decision.reason,
evidence: JSON.stringify(decision.metadata, null, 2),
suggestedFix:
"LLM session never produced an assistant message — check session-manager.ts:1086-1096 (silent _persist skip) and verify the model/provider is responding. The dispatcher will attempt retry within maxRetries; if persistent, transitions to blocked.",
occurredIn: {
unitType,
milestone: unitParts[0],
slice: unitParts[1],
task: unitParts.slice(2).join("/") || undefined,
},
source: "detector",
},
s.basePath,
);
ctx.ui.notify(decision.reason, "error");
return;
}
if (decision.action === "pause") {
if (getInFlightToolCount() > 0) return;
await closeoutUnit(

View file

@ -24,13 +24,13 @@ Write for the roadmap planner. It needs to understand: what exists in the codeba
## Calibrate Depth
Read the milestone title, the user's stated intent, and any inlined context above. Ask: does this milestone introduce new technology, span multiple unfamiliar subsystems, or have ambiguous scope? Or is it a focused feature in well-understood territory?
**Default to deep research.** Read the milestone title, the user's stated intent, and any inlined context above. Use deep mode unless you can give a concrete one-line justification for downscoping. The cost of light research on a genuinely uncertain milestone (wrong slice boundaries, missed pitfalls, fabricated risk story by the planner downstream) is far greater than the cost of a thorough exploration on a milestone that turned out simple.
- **Deep research** — new technology, novel architecture, multiple risky integrations, or genuinely ambiguous scope. Explore broadly, look up docs, investigate alternatives. Write the full strategic frame including risks, boundaries, and slice-ordering rationale. This is the default when the milestone is genuinely uncertain.
- **Targeted research**known technology but new to this codebase, or moderate complexity. Explore the relevant areas, check one or two libraries, identify constraints. Skip Comparable Systems if nothing applies.
- **Light research**well-scoped milestone using established patterns already in the codebase. Read the relevant files to confirm the pattern, note constraints, write Summary + Recommendation + Implementation Landscape. A light milestone-research doc can be 30-50 lines. Don't manufacture risks or comparable-systems analysis for work that doesn't have them.
- **Deep research (DEFAULT)** — explore broadly, look up docs via DeepWiki/Context7, run multiple web searches for comparable systems, investigate alternatives. Write the full strategic frame including risks, boundaries, comparable systems, and slice-ordering rationale. This is the assumption.
- **Targeted research**choose this only when the milestone is a well-defined feature in a familiar subsystem AND no novel technology is involved. Explore the relevant areas, check 1-2 libraries, identify constraints. Comparable Systems section is still required.
- **Light research**choose this ONLY when the milestone is trivial repetition of a pattern already in the codebase, with no external dependencies and no architectural decisions. State the explicit downscope reason in the Summary section so the planner sees why depth was reduced. A light milestone-research doc can be 30-50 lines.
An honest "this milestone is straightforward, here's the pattern and slice boundaries" beats a fabricated multi-page exploration for work that doesn't need it.
The previous calibration advice "an honest 'straightforward' beats a fabricated multi-page exploration" still holds — but the bar for declaring straightforward is high. If in doubt, go deep. Comparable Systems is MANDATORY for deep and targeted; only light research may omit it (and only with an explicit reason).
## Steps
@ -40,7 +40,7 @@ Research the codebase and relevant technologies. Narrate key findings and surpri
3. Explore relevant code. Use native `lsp` first for symbol lookup, references, and cross-file navigation. For small/familiar codebases, use `rg`, `find`, and targeted reads. For large or unfamiliar codebases, use `scout` to build a broad map efficiently before diving in.
3a. Use research swarms when the questions fan out cleanly. If the milestone spans 2-3 independent subsystems, dispatch parallel `scout`/`researcher` subagents with separate lenses, then synthesize their findings into one research artifact. Do not swarm one tightly-coupled question; do it inline.
4. **Documentation lookup — prefer DeepWiki first.** Use `ask_question` / `read_wiki_structure` / `read_wiki_contents` (DeepWiki) as the default for any GitHub-hosted library or framework — AI-indexed, no free-tier cap. Fall back to `resolve_library``get_library_docs` (Context7) for npm/pypi/crates packages DeepWiki doesn't have. **Context7 free tier is capped at 1000 requests/month — spend those on cases DeepWiki can't cover.** Skip both for libraries already used in this codebase.
5. **Web search budget:** You have a limited budget of web searches (max ~15 per session). Use them strategically — try DeepWiki → Context7 → web search in that order. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Target 3-5 total web searches for a typical research unit.
5. **Web search budget:** You have a budget of up to ~25 web searches per session. Use them strategically — try DeepWiki → Context7 → web search in that order. For deep research, target 8-12 web searches (comparable systems, prior art, library tradeoffs, common pitfalls); for targeted research, target 4-6; for light research, 0-2 is fine. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Spend the budget on real questions, not safety nets.
6. Use the **Research** output template from the inlined context above — include only sections that have real content
7. If `.sf/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors the user may or may not want.
8. Call `save_summary` with `milestone_id: {{milestoneId}}`, `artifact_type: "RESEARCH"`, and the full research markdown as `content` — the tool computes the file path and persists to both DB and disk.

View file

@ -64,6 +64,22 @@
- {{riskThatCouldSurfaceDuringExecution}}
## Comparable Systems
<!-- MANDATORY for deep and targeted research. Document 2-3 real systems
(open-source projects, papers, or published architectures) that solved a
similar problem. Look them up via DeepWiki/Context7/web search. For each:
what approach did they take, what tradeoffs did they accept, and what
should we steal vs. avoid. Light research may skip this section ONLY if
the milestone is trivial repetition of an existing in-repo pattern. Do
not fabricate — if you can't find genuine comparables, state that
explicitly and explain why none apply. -->
| System | Approach | Tradeoffs | What to steal / avoid |
|--------|----------|-----------|------------------------|
| {{system1}} | {{approach1}} | {{tradeoffs1}} | {{stealOrAvoid1}} |
| {{system2}} | {{approach2}} | {{tradeoffs2}} | {{stealOrAvoid2}} |
## Skills Discovered
<!-- Include when skill discovery found relevant skills. -->

View file

@ -147,17 +147,23 @@ export const UNIT_MANIFESTS = {
preferences: "active-only",
tools: TOOLS_PLANNING,
artifacts: {
// Phase 3 migration (#4782): matches today's actual
// buildResearchMilestonePrompt inlining order.
// #M005-remediation: knowledge resolved as an inline artifact so its
// position (between decisions and templates) is preserved by the
// composer's declared-order traversal. Graph stays as a computed
// artifact (always appended after templates), matching the prior
// builder's behavior. Eliminates the manual splice that previously
// existed in buildResearchMilestonePrompt.
inline: [
"milestone-context",
"project",
"requirements",
"decisions",
"knowledge",
"templates",
],
excerpt: [],
onDemand: [],
computed: ["graph"],
},
maxSystemPromptChars: COMMON_BUDGET_MEDIUM,
},

View file

@ -223,6 +223,47 @@ export function evaluateRunawayGuard(
) {
return { action: "none" };
}
// Silent-worker-failure detection (#sf-mp8c0arc-vgw8io):
// When the final warning has already been sent and the unit has produced
// zero tool calls past the elapsed threshold, the worker is not stuck in a
// busy loop — its LLM session never produced an assistant message, so the
// session JSONL was never written (session-manager.ts:1086-1096 skips
// _persist when !hasAssistant). hasMeaningfulGrowth is false because no
// tokens are flowing, so the existing pause branch never fires. Escalate
// to fail so the dispatcher can retry or transition to blocked instead of
// staying in runaway-final-warning-sent indefinitely.
if (
s.finalWarningSent &&
(unitMetrics.toolCalls ?? 0) === 0 &&
unitMetrics.elapsedMs > config.elapsedMs
) {
const reason =
`Runaway guard fail ${unitType} ${unitId}: zero tool calls in ` +
`${Math.round(unitMetrics.elapsedMs / 1000)}s after final warning — ` +
`silent worker failure suspected (LLM session never produced an assistant message).`;
return {
action: "fail",
reason,
metadata: {
reason,
failedAt: now,
unitType,
unitId,
diagnosticTurns: config.diagnosticTurns,
warningsSent: s.warningsSent,
thresholdReasons: reasons,
metrics: unitMetrics,
silentFailure: true,
thresholds: {
toolCallWarning: config.toolCallWarning,
tokenWarning: config.tokenWarning,
elapsedMs: config.elapsedMs,
changedFilesWarning: config.changedFilesWarning,
minIntervalMs: config.minIntervalMs,
},
},
};
}
if (
config.hardPause &&
s.finalWarningSent &&