diff --git a/.sf/REQUIREMENTS.md b/.sf/REQUIREMENTS.md index edc5fb307..346c15613 100644 --- a/.sf/REQUIREMENTS.md +++ b/.sf/REQUIREMENTS.md @@ -136,6 +136,39 @@ This file is the explicit capability and coverage contract for the project. - Validation: unmapped - Notes: Requires (1) new prompt template `prompts/fill-milestone-vision.md`, (2) new dispatchable unit wired in `auto-dispatch.js` + `state-transition-matrix.js`, (3) an exception in `buildRegistryAndFindActive` for one-shot `status=complete && vision=""` repair, (4) inline-fixer handler that converts the R011 self-feedback entry into a dispatch. Must satisfy R006 (fail-open) — recovery-unit failure halts with notification, never crashes the loop. +### R013 — Unified Dispatch v2: `inline` Scope for `full` Isolation +- Class: core-capability +- Status: active +- Description: Implement the `inline` scope row of `UNIFIED_DISPATCH_V2_PLAN.md`'s parameter matrix (line 152: `full | managed | inline | single`) so the autonomous loop can execute units in-process without spawning a subprocess/worktree. A new `src/resources/extensions/sf/dispatch-layer.js` exposes `DispatchLayer.dispatch(opts)` per the plan's API spec (lines 51-138). When `scope: 'inline'` and `isolation: 'full'`, the unit's executor runs in the calling process against the project DB directly — no `child_process.spawn`, no session-status-io files, no worktree. +- Why it matters: The current spawn-based path silently fails on `validate-milestone` and likely other unit types (self-feedback `sf-mp8bhp5s-cmgt8d`, critical, blocking) — worker session IDs are issued and tracked in `.sf/runtime/units/*.json` but the worker never writes its session JSONL and `recoveryAttempts` stays at 0 across runaway-final-warning phases. Universal across providers (kimi-k2.6 and minimax both produce 0 tool calls with heartbeats only). Adding an inline path naturally retires this whole class of bug for units that don't need worktree isolation. Also reduces process-start latency and removes the file-based-IPC pressure point that has accumulated multiple historical issues. +- Source: spec +- Primary owning slice: unmapped +- Supporting slices: none +- Validation: unmapped +- Notes: Aligned with `docs/plans/UNIFIED_DISPATCH_V2_PLAN.md` (Qwen Plan, 2026-05-08). Scope of R013 is the **minimum slice** of that plan: just `full + managed + inline + single`. Other rows of the matrix (parallel/debate/chain inline, slice/milestone scope with worktrees) are out of scope for R013 and stay on their current implementations. Resolves `sf-mp8bhp5s-cmgt8d` and likely the 56+ historical `runaway-loop:idle-halt` entries on M005. + +### R014 — Inline Worker Bootstrap Without Spawned `sf` CLI +- Class: core-capability +- Status: active +- Description: Extract the unit-execution code path that `sf headless autonomous` currently invokes after spawn into a callable function (`runUnitInline(unitType, unitId, ctx)`) usable from the same process. UOK kernel calls it directly when dispatching with `scope: 'inline'`. Must respect the single-writer invariant on `.sf/sf.db` (`sf-db.js`); the in-process call shares the kernel's existing WAL connection rather than opening a new one. +- Why it matters: Today the unit executor is reachable only via subprocess argv parsing in the headless CLI surface. Without this extraction, R013's inline scope cannot wire a real executor — the dispatcher would have nothing to call. This is the prerequisite for R013. +- Source: spec +- Primary owning slice: unmapped +- Supporting slices: none +- Validation: unmapped +- Notes: Reuses existing unit-context-manifest, prompt builders, and tool registries. The only change is execution surface: function call instead of process boundary. Session JSONL is still written for audit but to a path keyed off the in-process session ID, not a worker subprocess. + +### R015 — Spawn-Failure Loud Failure (Defensive) +- Class: failure-visibility +- Status: active +- Description: Until R013/R014 land for every unit type, the existing spawn path must fail loudly. If a dispatched worker fails to write its session JSONL within a configurable timeout (default 30s) AND has zero `progressCount`, the runtime must (a) transition the unit to `status: failed`, (b) capture any stderr from the spawn into `lineage.events`, (c) emit a doctor-visible signal, and (d) trigger the retry path up to `maxRetries`. Today the runaway watchdog only fires a warning and never retries — `recoveryAttempts` stays at 0. +- Why it matters: Even after inline scope retires the spawn path for the common cases, spawn-based dispatch will persist for milestone/slice-scope workers and parallel modes. Silent failure is the worst possible behavior — operator sees a "running" unit that's a ghost. This requirement keeps the spawn path observable for as long as it exists. +- Source: spec +- Primary owning slice: unmapped +- Supporting slices: none +- Validation: unmapped +- Notes: Touches the runaway-recovery / unit-ownership / parallel-orchestrator surfaces. Distinct from R013 — R013 removes the bug for inline scope; R015 contains the bug for non-inline scope. + ## Traceability | ID | Class | Status | Primary owner | Supporting | Proof | @@ -152,10 +185,13 @@ This file is the explicit capability and coverage contract for the project. | R010 | quality-attribute | active | M005/S02 | none | unmapped | | R011 | failure-visibility | active | unmapped | none | unmapped | | R012 | differentiator | active | unmapped | none | unmapped | +| R013 | core-capability | active | unmapped | none | unmapped | +| R014 | core-capability | active | unmapped | none | unmapped | +| R015 | failure-visibility | active | unmapped | none | unmapped | ## Coverage Summary -- Active requirements: 12 +- Active requirements: 15 - Mapped to slices: 10 - Validated: 0 -- Unmapped active requirements: 2 (R011, R012 — pending planning into a new self-heal extension slice or M003 follow-on) +- Unmapped active requirements: 5 (R011, R012 — self-heal extension; R013, R014, R015 — UNIFIED_DISPATCH_V2 inline scope, anchored to docs/plans/UNIFIED_DISPATCH_V2_PLAN.md) diff --git a/packages/coding-agent/src/core/session-manager.ts b/packages/coding-agent/src/core/session-manager.ts index 2541a6bec..05e9f679f 100644 --- a/packages/coding-agent/src/core/session-manager.ts +++ b/packages/coding-agent/src/core/session-manager.ts @@ -1086,15 +1086,17 @@ export class SessionManager { _persist(entry: SessionEntry): void { if (!this.persist || !this.sessionFile) return; - const hasAssistant = this.fileEntries.some( - (e) => e.type === "message" && e.message.role === "assistant", - ); - if (!hasAssistant) { - // Mark as not flushed so when assistant arrives, all entries get written - this.flushed = false; - return; - } - + // #R015-remediation (sf-mp8c0arc-vgw8io): previously this method + // deferred file creation until the first assistant message arrived + // (silent return on !hasAssistant). The intent was to avoid empty + // files for cancelled/never-started sessions, but the cost was + // silent invisibility when the LLM never produced an assistant + // message — failed sessions left zero forensic trail and the SF + // autonomous loop's watchdog couldn't tell a live session from a + // dead one. The eventual cost (debugging M005's chronic stuck + // state) far exceeded the saved disk space. We now write entries + // as soon as they're added, so the session JSONL exists with at + // least the session header + user prompt from the very first turn. let release: (() => void) | undefined; try { release = tryAcquireLockSync(this.sessionFile); diff --git a/src/resources/extensions/sf/auto-prompts.js b/src/resources/extensions/sf/auto-prompts.js index a88bc925b..84b263117 100644 --- a/src/resources/extensions/sf/auto-prompts.js +++ b/src/resources/extensions/sf/auto-prompts.js @@ -1150,10 +1150,9 @@ export async function buildDiscussProjectPrompt( }); const parts = []; if (composed) parts.push(composed); - const knowledgeBlockDP = await inlineKnowledgeScoped(base, []); - if (knowledgeBlockDP) parts.push(knowledgeBlockDP); - const graphBlockDP = await inlineGraphSubgraph(base, "project setup", { budget: 3000 }); - if (graphBlockDP) parts.push(graphBlockDP); + // #M005-remediation: knowledge/graph are computed artifacts already + // included in `composed` via the computed registry above. Manual + // re-injection here caused duplicate sections in the prompt output. const inlinedContext = capPreamble( `## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`, ); @@ -1208,12 +1207,7 @@ export async function buildDiscussRequirementsPrompt( }); const parts = []; if (composed) parts.push(composed); - const knowledgeBlockDR = await inlineKnowledgeScoped(base, []); - if (knowledgeBlockDR) parts.push(knowledgeBlockDR); - const graphBlockDR = await inlineGraphSubgraph(base, "project requirements", { - budget: 3000, - }); - if (graphBlockDR) parts.push(graphBlockDR); + // #M005-remediation: knowledge/graph included via composed (computed registry). const inlinedContext = capPreamble( `## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`, ); @@ -1276,12 +1270,7 @@ export async function buildResearchProjectPrompt( }); const parts = []; if (composed) parts.push(composed); - const knowledgeBlockRP = await inlineKnowledgeScoped(base, []); - if (knowledgeBlockRP) parts.push(knowledgeBlockRP); - const graphBlockRP = await inlineGraphSubgraph(base, "project research", { - budget: 3000, - }); - if (graphBlockRP) parts.push(graphBlockRP); + // #M005-remediation: knowledge/graph included via composed (computed registry). const inlinedContext = capPreamble( `## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`, ); @@ -1346,12 +1335,7 @@ export async function buildDiscussMilestonePrompt( }); const parts = []; if (composed) parts.push(composed); - const knowledgeBlockDM = await inlineKnowledgeScoped(base, []); - if (knowledgeBlockDM) parts.push(knowledgeBlockDM); - const graphBlockDM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, { - budget: 3000, - }); - if (graphBlockDM) parts.push(graphBlockDM); + // #M005-remediation: knowledge/graph included via composed (computed registry). const inlinedContext = capPreamble( `## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`, ); @@ -1373,11 +1357,12 @@ export async function buildDiscussMilestonePrompt( return basePrompt; } export async function buildResearchMilestonePrompt(mid, midTitle, base) { - // #4782 phase 3: research-milestone migrated through the composer. - // Declared inline order: milestone-context, project, requirements, - // decisions, templates. Knowledge stays outside the composer - // (budget-driven, scoped by keyword extraction — future phase folds - // policy-driven blocks in). + // #M005-remediation: research-milestone now fully delegates ordering + // to the v2 composer. The manifest declares knowledge as an inline + // artifact (positioned between decisions and templates) so its + // keyword-budgeted resolver runs in the correct slot. Graph is a + // computed artifact appended after templates. Eliminates the manual + // splice + duplicate-injection pattern previously inlined here. const resolveArtifact = async (key) => { switch (key) { case "milestone-context": { @@ -1391,6 +1376,8 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) { return await inlineRequirementsFromDb(base, mid); case "decisions": return await inlineDecisionsFromDb(base, mid); + case "knowledge": + return await inlineKnowledgeBudgeted(base, extractKeywords(midTitle)); case "templates": return inlineTemplate("research", "Research"); default: @@ -1398,37 +1385,18 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) { } }; const { inline: composed } = await composeUnitContext("research-milestone", { - resolveArtifact, - }); - // Knowledge block stays outside the composer — budgeted, scoped via - // keyword extraction (#4719). Inserted between decisions and the - // templates block to match the pre-migration output order. We split - // the composer output around the templates section to preserve that - // ordering. - const knowledgeInlineRM = await inlineKnowledgeBudgeted( base, - extractKeywords(midTitle), - ); - const graphBlockRM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, { - budget: 3000, + resolveArtifact, + computed: { + graph: { + build: async (_, b) => + inlineGraphSubgraph(b, `${mid} ${midTitle}`, { budget: 3000 }), + inputs: {}, + }, + }, }); const parts = []; - if (knowledgeInlineRM && composed) { - // Insert knowledge before the template block so the overall order is: - // milestone-context → project → requirements → decisions → KNOWLEDGE → research template - const idx = composed.lastIndexOf("### Output Template:"); - if (idx > 0) { - const before = composed.slice(0, idx).replace(/\n\n---\n\n$/, ""); - const after = composed.slice(idx); - parts.push(before, knowledgeInlineRM, after); - } else { - parts.push(composed, knowledgeInlineRM); - } - } else if (composed) { - parts.push(composed); - if (knowledgeInlineRM) parts.push(knowledgeInlineRM); - } - if (graphBlockRM) parts.push(graphBlockRM); + if (composed) parts.push(composed); const inlinedContext = capPreamble( `## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`, ); diff --git a/src/resources/extensions/sf/auto-timers.js b/src/resources/extensions/sf/auto-timers.js index 83247914e..93a2baa39 100644 --- a/src/resources/extensions/sf/auto-timers.js +++ b/src/resources/extensions/sf/auto-timers.js @@ -271,6 +271,51 @@ export function startUnitSupervision(sctx) { ); return; } + if (decision.action === "fail") { + if (getInFlightToolCount() > 0) return; + await closeoutUnit( + ctx, + s.basePath, + s.currentUnit.type, + s.currentUnit.id, + s.currentUnit.startedAt, + buildSnapshotOpts(), + ); + writeUnitRuntimeRecord( + s.basePath, + unitType, + unitId, + s.currentUnit.startedAt, + { + phase: "failed-silent-worker", + status: "failed", + lastProgressAt: Date.now(), + lastProgressKind: "runaway-guard-fail", + runawayGuardFail: decision.metadata, + }, + ); + const unitParts = unitId.split("/"); + recordSelfFeedback( + { + kind: "runaway-loop:silent-worker-failure", + severity: "high", + summary: decision.reason, + evidence: JSON.stringify(decision.metadata, null, 2), + suggestedFix: + "LLM session never produced an assistant message — check session-manager.ts:1086-1096 (silent _persist skip) and verify the model/provider is responding. The dispatcher will attempt retry within maxRetries; if persistent, transitions to blocked.", + occurredIn: { + unitType, + milestone: unitParts[0], + slice: unitParts[1], + task: unitParts.slice(2).join("/") || undefined, + }, + source: "detector", + }, + s.basePath, + ); + ctx.ui.notify(decision.reason, "error"); + return; + } if (decision.action === "pause") { if (getInFlightToolCount() > 0) return; await closeoutUnit( diff --git a/src/resources/extensions/sf/prompts/research-milestone.md b/src/resources/extensions/sf/prompts/research-milestone.md index 609ac84c8..69f1ddf61 100644 --- a/src/resources/extensions/sf/prompts/research-milestone.md +++ b/src/resources/extensions/sf/prompts/research-milestone.md @@ -24,13 +24,13 @@ Write for the roadmap planner. It needs to understand: what exists in the codeba ## Calibrate Depth -Read the milestone title, the user's stated intent, and any inlined context above. Ask: does this milestone introduce new technology, span multiple unfamiliar subsystems, or have ambiguous scope? Or is it a focused feature in well-understood territory? +**Default to deep research.** Read the milestone title, the user's stated intent, and any inlined context above. Use deep mode unless you can give a concrete one-line justification for downscoping. The cost of light research on a genuinely uncertain milestone (wrong slice boundaries, missed pitfalls, fabricated risk story by the planner downstream) is far greater than the cost of a thorough exploration on a milestone that turned out simple. -- **Deep research** — new technology, novel architecture, multiple risky integrations, or genuinely ambiguous scope. Explore broadly, look up docs, investigate alternatives. Write the full strategic frame including risks, boundaries, and slice-ordering rationale. This is the default when the milestone is genuinely uncertain. -- **Targeted research** — known technology but new to this codebase, or moderate complexity. Explore the relevant areas, check one or two libraries, identify constraints. Skip Comparable Systems if nothing applies. -- **Light research** — well-scoped milestone using established patterns already in the codebase. Read the relevant files to confirm the pattern, note constraints, write Summary + Recommendation + Implementation Landscape. A light milestone-research doc can be 30-50 lines. Don't manufacture risks or comparable-systems analysis for work that doesn't have them. +- **Deep research (DEFAULT)** — explore broadly, look up docs via DeepWiki/Context7, run multiple web searches for comparable systems, investigate alternatives. Write the full strategic frame including risks, boundaries, comparable systems, and slice-ordering rationale. This is the assumption. +- **Targeted research** — choose this only when the milestone is a well-defined feature in a familiar subsystem AND no novel technology is involved. Explore the relevant areas, check 1-2 libraries, identify constraints. Comparable Systems section is still required. +- **Light research** — choose this ONLY when the milestone is trivial repetition of a pattern already in the codebase, with no external dependencies and no architectural decisions. State the explicit downscope reason in the Summary section so the planner sees why depth was reduced. A light milestone-research doc can be 30-50 lines. -An honest "this milestone is straightforward, here's the pattern and slice boundaries" beats a fabricated multi-page exploration for work that doesn't need it. +The previous calibration advice "an honest 'straightforward' beats a fabricated multi-page exploration" still holds — but the bar for declaring straightforward is high. If in doubt, go deep. Comparable Systems is MANDATORY for deep and targeted; only light research may omit it (and only with an explicit reason). ## Steps @@ -40,7 +40,7 @@ Research the codebase and relevant technologies. Narrate key findings and surpri 3. Explore relevant code. Use native `lsp` first for symbol lookup, references, and cross-file navigation. For small/familiar codebases, use `rg`, `find`, and targeted reads. For large or unfamiliar codebases, use `scout` to build a broad map efficiently before diving in. 3a. Use research swarms when the questions fan out cleanly. If the milestone spans 2-3 independent subsystems, dispatch parallel `scout`/`researcher` subagents with separate lenses, then synthesize their findings into one research artifact. Do not swarm one tightly-coupled question; do it inline. 4. **Documentation lookup — prefer DeepWiki first.** Use `ask_question` / `read_wiki_structure` / `read_wiki_contents` (DeepWiki) as the default for any GitHub-hosted library or framework — AI-indexed, no free-tier cap. Fall back to `resolve_library` → `get_library_docs` (Context7) for npm/pypi/crates packages DeepWiki doesn't have. **Context7 free tier is capped at 1000 requests/month — spend those on cases DeepWiki can't cover.** Skip both for libraries already used in this codebase. -5. **Web search budget:** You have a limited budget of web searches (max ~15 per session). Use them strategically — try DeepWiki → Context7 → web search in that order. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Target 3-5 total web searches for a typical research unit. +5. **Web search budget:** You have a budget of up to ~25 web searches per session. Use them strategically — try DeepWiki → Context7 → web search in that order. For deep research, target 8-12 web searches (comparable systems, prior art, library tradeoffs, common pitfalls); for targeted research, target 4-6; for light research, 0-2 is fine. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Spend the budget on real questions, not safety nets. 6. Use the **Research** output template from the inlined context above — include only sections that have real content 7. If `.sf/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors the user may or may not want. 8. Call `save_summary` with `milestone_id: {{milestoneId}}`, `artifact_type: "RESEARCH"`, and the full research markdown as `content` — the tool computes the file path and persists to both DB and disk. diff --git a/src/resources/extensions/sf/templates/research.md b/src/resources/extensions/sf/templates/research.md index fb63e757e..5b2e47c5a 100644 --- a/src/resources/extensions/sf/templates/research.md +++ b/src/resources/extensions/sf/templates/research.md @@ -64,6 +64,22 @@ - {{riskThatCouldSurfaceDuringExecution}} +## Comparable Systems + + + +| System | Approach | Tradeoffs | What to steal / avoid | +|--------|----------|-----------|------------------------| +| {{system1}} | {{approach1}} | {{tradeoffs1}} | {{stealOrAvoid1}} | +| {{system2}} | {{approach2}} | {{tradeoffs2}} | {{stealOrAvoid2}} | + ## Skills Discovered diff --git a/src/resources/extensions/sf/unit-context-manifest.js b/src/resources/extensions/sf/unit-context-manifest.js index ea2ca2cdc..455b455db 100644 --- a/src/resources/extensions/sf/unit-context-manifest.js +++ b/src/resources/extensions/sf/unit-context-manifest.js @@ -147,17 +147,23 @@ export const UNIT_MANIFESTS = { preferences: "active-only", tools: TOOLS_PLANNING, artifacts: { - // Phase 3 migration (#4782): matches today's actual - // buildResearchMilestonePrompt inlining order. + // #M005-remediation: knowledge resolved as an inline artifact so its + // position (between decisions and templates) is preserved by the + // composer's declared-order traversal. Graph stays as a computed + // artifact (always appended after templates), matching the prior + // builder's behavior. Eliminates the manual splice that previously + // existed in buildResearchMilestonePrompt. inline: [ "milestone-context", "project", "requirements", "decisions", + "knowledge", "templates", ], excerpt: [], onDemand: [], + computed: ["graph"], }, maxSystemPromptChars: COMMON_BUDGET_MEDIUM, }, diff --git a/src/resources/extensions/sf/uok/auto-runaway-guard.js b/src/resources/extensions/sf/uok/auto-runaway-guard.js index 93179b4f8..93baf2630 100644 --- a/src/resources/extensions/sf/uok/auto-runaway-guard.js +++ b/src/resources/extensions/sf/uok/auto-runaway-guard.js @@ -223,6 +223,47 @@ export function evaluateRunawayGuard( ) { return { action: "none" }; } + // Silent-worker-failure detection (#sf-mp8c0arc-vgw8io): + // When the final warning has already been sent and the unit has produced + // zero tool calls past the elapsed threshold, the worker is not stuck in a + // busy loop — its LLM session never produced an assistant message, so the + // session JSONL was never written (session-manager.ts:1086-1096 skips + // _persist when !hasAssistant). hasMeaningfulGrowth is false because no + // tokens are flowing, so the existing pause branch never fires. Escalate + // to fail so the dispatcher can retry or transition to blocked instead of + // staying in runaway-final-warning-sent indefinitely. + if ( + s.finalWarningSent && + (unitMetrics.toolCalls ?? 0) === 0 && + unitMetrics.elapsedMs > config.elapsedMs + ) { + const reason = + `Runaway guard fail ${unitType} ${unitId}: zero tool calls in ` + + `${Math.round(unitMetrics.elapsedMs / 1000)}s after final warning — ` + + `silent worker failure suspected (LLM session never produced an assistant message).`; + return { + action: "fail", + reason, + metadata: { + reason, + failedAt: now, + unitType, + unitId, + diagnosticTurns: config.diagnosticTurns, + warningsSent: s.warningsSent, + thresholdReasons: reasons, + metrics: unitMetrics, + silentFailure: true, + thresholds: { + toolCallWarning: config.toolCallWarning, + tokenWarning: config.tokenWarning, + elapsedMs: config.elapsedMs, + changedFilesWarning: config.changedFilesWarning, + minIntervalMs: config.minIntervalMs, + }, + }, + }; + } if ( config.hardPause && s.finalWarningSent &&