sf snapshot: uncommitted changes after 56m inactivity
This commit is contained in:
parent
6071a9207c
commit
da0c41d375
8 changed files with 188 additions and 74 deletions
|
|
@ -136,6 +136,39 @@ This file is the explicit capability and coverage contract for the project.
|
|||
- Validation: unmapped
|
||||
- Notes: Requires (1) new prompt template `prompts/fill-milestone-vision.md`, (2) new dispatchable unit wired in `auto-dispatch.js` + `state-transition-matrix.js`, (3) an exception in `buildRegistryAndFindActive` for one-shot `status=complete && vision=""` repair, (4) inline-fixer handler that converts the R011 self-feedback entry into a dispatch. Must satisfy R006 (fail-open) — recovery-unit failure halts with notification, never crashes the loop.
|
||||
|
||||
### R013 — Unified Dispatch v2: `inline` Scope for `full` Isolation
|
||||
- Class: core-capability
|
||||
- Status: active
|
||||
- Description: Implement the `inline` scope row of `UNIFIED_DISPATCH_V2_PLAN.md`'s parameter matrix (line 152: `full | managed | inline | single`) so the autonomous loop can execute units in-process without spawning a subprocess/worktree. A new `src/resources/extensions/sf/dispatch-layer.js` exposes `DispatchLayer.dispatch(opts)` per the plan's API spec (lines 51-138). When `scope: 'inline'` and `isolation: 'full'`, the unit's executor runs in the calling process against the project DB directly — no `child_process.spawn`, no session-status-io files, no worktree.
|
||||
- Why it matters: The current spawn-based path silently fails on `validate-milestone` and likely other unit types (self-feedback `sf-mp8bhp5s-cmgt8d`, critical, blocking) — worker session IDs are issued and tracked in `.sf/runtime/units/*.json` but the worker never writes its session JSONL and `recoveryAttempts` stays at 0 across runaway-final-warning phases. Universal across providers (kimi-k2.6 and minimax both produce 0 tool calls with heartbeats only). Adding an inline path naturally retires this whole class of bug for units that don't need worktree isolation. Also reduces process-start latency and removes the file-based-IPC pressure point that has accumulated multiple historical issues.
|
||||
- Source: spec
|
||||
- Primary owning slice: unmapped
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Aligned with `docs/plans/UNIFIED_DISPATCH_V2_PLAN.md` (Qwen Plan, 2026-05-08). Scope of R013 is the **minimum slice** of that plan: just `full + managed + inline + single`. Other rows of the matrix (parallel/debate/chain inline, slice/milestone scope with worktrees) are out of scope for R013 and stay on their current implementations. Resolves `sf-mp8bhp5s-cmgt8d` and likely the 56+ historical `runaway-loop:idle-halt` entries on M005.
|
||||
|
||||
### R014 — Inline Worker Bootstrap Without Spawned `sf` CLI
|
||||
- Class: core-capability
|
||||
- Status: active
|
||||
- Description: Extract the unit-execution code path that `sf headless autonomous` currently invokes after spawn into a callable function (`runUnitInline(unitType, unitId, ctx)`) usable from the same process. UOK kernel calls it directly when dispatching with `scope: 'inline'`. Must respect the single-writer invariant on `.sf/sf.db` (`sf-db.js`); the in-process call shares the kernel's existing WAL connection rather than opening a new one.
|
||||
- Why it matters: Today the unit executor is reachable only via subprocess argv parsing in the headless CLI surface. Without this extraction, R013's inline scope cannot wire a real executor — the dispatcher would have nothing to call. This is the prerequisite for R013.
|
||||
- Source: spec
|
||||
- Primary owning slice: unmapped
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Reuses existing unit-context-manifest, prompt builders, and tool registries. The only change is execution surface: function call instead of process boundary. Session JSONL is still written for audit but to a path keyed off the in-process session ID, not a worker subprocess.
|
||||
|
||||
### R015 — Spawn-Failure Loud Failure (Defensive)
|
||||
- Class: failure-visibility
|
||||
- Status: active
|
||||
- Description: Until R013/R014 land for every unit type, the existing spawn path must fail loudly. If a dispatched worker fails to write its session JSONL within a configurable timeout (default 30s) AND has zero `progressCount`, the runtime must (a) transition the unit to `status: failed`, (b) capture any stderr from the spawn into `lineage.events`, (c) emit a doctor-visible signal, and (d) trigger the retry path up to `maxRetries`. Today the runaway watchdog only fires a warning and never retries — `recoveryAttempts` stays at 0.
|
||||
- Why it matters: Even after inline scope retires the spawn path for the common cases, spawn-based dispatch will persist for milestone/slice-scope workers and parallel modes. Silent failure is the worst possible behavior — operator sees a "running" unit that's a ghost. This requirement keeps the spawn path observable for as long as it exists.
|
||||
- Source: spec
|
||||
- Primary owning slice: unmapped
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Touches the runaway-recovery / unit-ownership / parallel-orchestrator surfaces. Distinct from R013 — R013 removes the bug for inline scope; R015 contains the bug for non-inline scope.
|
||||
|
||||
## Traceability
|
||||
|
||||
| ID | Class | Status | Primary owner | Supporting | Proof |
|
||||
|
|
@ -152,10 +185,13 @@ This file is the explicit capability and coverage contract for the project.
|
|||
| R010 | quality-attribute | active | M005/S02 | none | unmapped |
|
||||
| R011 | failure-visibility | active | unmapped | none | unmapped |
|
||||
| R012 | differentiator | active | unmapped | none | unmapped |
|
||||
| R013 | core-capability | active | unmapped | none | unmapped |
|
||||
| R014 | core-capability | active | unmapped | none | unmapped |
|
||||
| R015 | failure-visibility | active | unmapped | none | unmapped |
|
||||
|
||||
## Coverage Summary
|
||||
|
||||
- Active requirements: 12
|
||||
- Active requirements: 15
|
||||
- Mapped to slices: 10
|
||||
- Validated: 0
|
||||
- Unmapped active requirements: 2 (R011, R012 — pending planning into a new self-heal extension slice or M003 follow-on)
|
||||
- Unmapped active requirements: 5 (R011, R012 — self-heal extension; R013, R014, R015 — UNIFIED_DISPATCH_V2 inline scope, anchored to docs/plans/UNIFIED_DISPATCH_V2_PLAN.md)
|
||||
|
|
|
|||
|
|
@ -1086,15 +1086,17 @@ export class SessionManager {
|
|||
_persist(entry: SessionEntry): void {
|
||||
if (!this.persist || !this.sessionFile) return;
|
||||
|
||||
const hasAssistant = this.fileEntries.some(
|
||||
(e) => e.type === "message" && e.message.role === "assistant",
|
||||
);
|
||||
if (!hasAssistant) {
|
||||
// Mark as not flushed so when assistant arrives, all entries get written
|
||||
this.flushed = false;
|
||||
return;
|
||||
}
|
||||
|
||||
// #R015-remediation (sf-mp8c0arc-vgw8io): previously this method
|
||||
// deferred file creation until the first assistant message arrived
|
||||
// (silent return on !hasAssistant). The intent was to avoid empty
|
||||
// files for cancelled/never-started sessions, but the cost was
|
||||
// silent invisibility when the LLM never produced an assistant
|
||||
// message — failed sessions left zero forensic trail and the SF
|
||||
// autonomous loop's watchdog couldn't tell a live session from a
|
||||
// dead one. The eventual cost (debugging M005's chronic stuck
|
||||
// state) far exceeded the saved disk space. We now write entries
|
||||
// as soon as they're added, so the session JSONL exists with at
|
||||
// least the session header + user prompt from the very first turn.
|
||||
let release: (() => void) | undefined;
|
||||
try {
|
||||
release = tryAcquireLockSync(this.sessionFile);
|
||||
|
|
|
|||
|
|
@ -1150,10 +1150,9 @@ export async function buildDiscussProjectPrompt(
|
|||
});
|
||||
const parts = [];
|
||||
if (composed) parts.push(composed);
|
||||
const knowledgeBlockDP = await inlineKnowledgeScoped(base, []);
|
||||
if (knowledgeBlockDP) parts.push(knowledgeBlockDP);
|
||||
const graphBlockDP = await inlineGraphSubgraph(base, "project setup", { budget: 3000 });
|
||||
if (graphBlockDP) parts.push(graphBlockDP);
|
||||
// #M005-remediation: knowledge/graph are computed artifacts already
|
||||
// included in `composed` via the computed registry above. Manual
|
||||
// re-injection here caused duplicate sections in the prompt output.
|
||||
const inlinedContext = capPreamble(
|
||||
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
|
||||
);
|
||||
|
|
@ -1208,12 +1207,7 @@ export async function buildDiscussRequirementsPrompt(
|
|||
});
|
||||
const parts = [];
|
||||
if (composed) parts.push(composed);
|
||||
const knowledgeBlockDR = await inlineKnowledgeScoped(base, []);
|
||||
if (knowledgeBlockDR) parts.push(knowledgeBlockDR);
|
||||
const graphBlockDR = await inlineGraphSubgraph(base, "project requirements", {
|
||||
budget: 3000,
|
||||
});
|
||||
if (graphBlockDR) parts.push(graphBlockDR);
|
||||
// #M005-remediation: knowledge/graph included via composed (computed registry).
|
||||
const inlinedContext = capPreamble(
|
||||
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
|
||||
);
|
||||
|
|
@ -1276,12 +1270,7 @@ export async function buildResearchProjectPrompt(
|
|||
});
|
||||
const parts = [];
|
||||
if (composed) parts.push(composed);
|
||||
const knowledgeBlockRP = await inlineKnowledgeScoped(base, []);
|
||||
if (knowledgeBlockRP) parts.push(knowledgeBlockRP);
|
||||
const graphBlockRP = await inlineGraphSubgraph(base, "project research", {
|
||||
budget: 3000,
|
||||
});
|
||||
if (graphBlockRP) parts.push(graphBlockRP);
|
||||
// #M005-remediation: knowledge/graph included via composed (computed registry).
|
||||
const inlinedContext = capPreamble(
|
||||
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
|
||||
);
|
||||
|
|
@ -1346,12 +1335,7 @@ export async function buildDiscussMilestonePrompt(
|
|||
});
|
||||
const parts = [];
|
||||
if (composed) parts.push(composed);
|
||||
const knowledgeBlockDM = await inlineKnowledgeScoped(base, []);
|
||||
if (knowledgeBlockDM) parts.push(knowledgeBlockDM);
|
||||
const graphBlockDM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, {
|
||||
budget: 3000,
|
||||
});
|
||||
if (graphBlockDM) parts.push(graphBlockDM);
|
||||
// #M005-remediation: knowledge/graph included via composed (computed registry).
|
||||
const inlinedContext = capPreamble(
|
||||
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
|
||||
);
|
||||
|
|
@ -1373,11 +1357,12 @@ export async function buildDiscussMilestonePrompt(
|
|||
return basePrompt;
|
||||
}
|
||||
export async function buildResearchMilestonePrompt(mid, midTitle, base) {
|
||||
// #4782 phase 3: research-milestone migrated through the composer.
|
||||
// Declared inline order: milestone-context, project, requirements,
|
||||
// decisions, templates. Knowledge stays outside the composer
|
||||
// (budget-driven, scoped by keyword extraction — future phase folds
|
||||
// policy-driven blocks in).
|
||||
// #M005-remediation: research-milestone now fully delegates ordering
|
||||
// to the v2 composer. The manifest declares knowledge as an inline
|
||||
// artifact (positioned between decisions and templates) so its
|
||||
// keyword-budgeted resolver runs in the correct slot. Graph is a
|
||||
// computed artifact appended after templates. Eliminates the manual
|
||||
// splice + duplicate-injection pattern previously inlined here.
|
||||
const resolveArtifact = async (key) => {
|
||||
switch (key) {
|
||||
case "milestone-context": {
|
||||
|
|
@ -1391,6 +1376,8 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) {
|
|||
return await inlineRequirementsFromDb(base, mid);
|
||||
case "decisions":
|
||||
return await inlineDecisionsFromDb(base, mid);
|
||||
case "knowledge":
|
||||
return await inlineKnowledgeBudgeted(base, extractKeywords(midTitle));
|
||||
case "templates":
|
||||
return inlineTemplate("research", "Research");
|
||||
default:
|
||||
|
|
@ -1398,37 +1385,18 @@ export async function buildResearchMilestonePrompt(mid, midTitle, base) {
|
|||
}
|
||||
};
|
||||
const { inline: composed } = await composeUnitContext("research-milestone", {
|
||||
resolveArtifact,
|
||||
});
|
||||
// Knowledge block stays outside the composer — budgeted, scoped via
|
||||
// keyword extraction (#4719). Inserted between decisions and the
|
||||
// templates block to match the pre-migration output order. We split
|
||||
// the composer output around the templates section to preserve that
|
||||
// ordering.
|
||||
const knowledgeInlineRM = await inlineKnowledgeBudgeted(
|
||||
base,
|
||||
extractKeywords(midTitle),
|
||||
);
|
||||
const graphBlockRM = await inlineGraphSubgraph(base, `${mid} ${midTitle}`, {
|
||||
budget: 3000,
|
||||
resolveArtifact,
|
||||
computed: {
|
||||
graph: {
|
||||
build: async (_, b) =>
|
||||
inlineGraphSubgraph(b, `${mid} ${midTitle}`, { budget: 3000 }),
|
||||
inputs: {},
|
||||
},
|
||||
},
|
||||
});
|
||||
const parts = [];
|
||||
if (knowledgeInlineRM && composed) {
|
||||
// Insert knowledge before the template block so the overall order is:
|
||||
// milestone-context → project → requirements → decisions → KNOWLEDGE → research template
|
||||
const idx = composed.lastIndexOf("### Output Template:");
|
||||
if (idx > 0) {
|
||||
const before = composed.slice(0, idx).replace(/\n\n---\n\n$/, "");
|
||||
const after = composed.slice(idx);
|
||||
parts.push(before, knowledgeInlineRM, after);
|
||||
} else {
|
||||
parts.push(composed, knowledgeInlineRM);
|
||||
}
|
||||
} else if (composed) {
|
||||
parts.push(composed);
|
||||
if (knowledgeInlineRM) parts.push(knowledgeInlineRM);
|
||||
}
|
||||
if (graphBlockRM) parts.push(graphBlockRM);
|
||||
if (composed) parts.push(composed);
|
||||
const inlinedContext = capPreamble(
|
||||
`## Inlined Context (preloaded — do not re-read these files)\n\n${parts.join("\n\n---\n\n")}`,
|
||||
);
|
||||
|
|
|
|||
|
|
@ -271,6 +271,51 @@ export function startUnitSupervision(sctx) {
|
|||
);
|
||||
return;
|
||||
}
|
||||
if (decision.action === "fail") {
|
||||
if (getInFlightToolCount() > 0) return;
|
||||
await closeoutUnit(
|
||||
ctx,
|
||||
s.basePath,
|
||||
s.currentUnit.type,
|
||||
s.currentUnit.id,
|
||||
s.currentUnit.startedAt,
|
||||
buildSnapshotOpts(),
|
||||
);
|
||||
writeUnitRuntimeRecord(
|
||||
s.basePath,
|
||||
unitType,
|
||||
unitId,
|
||||
s.currentUnit.startedAt,
|
||||
{
|
||||
phase: "failed-silent-worker",
|
||||
status: "failed",
|
||||
lastProgressAt: Date.now(),
|
||||
lastProgressKind: "runaway-guard-fail",
|
||||
runawayGuardFail: decision.metadata,
|
||||
},
|
||||
);
|
||||
const unitParts = unitId.split("/");
|
||||
recordSelfFeedback(
|
||||
{
|
||||
kind: "runaway-loop:silent-worker-failure",
|
||||
severity: "high",
|
||||
summary: decision.reason,
|
||||
evidence: JSON.stringify(decision.metadata, null, 2),
|
||||
suggestedFix:
|
||||
"LLM session never produced an assistant message — check session-manager.ts:1086-1096 (silent _persist skip) and verify the model/provider is responding. The dispatcher will attempt retry within maxRetries; if persistent, transitions to blocked.",
|
||||
occurredIn: {
|
||||
unitType,
|
||||
milestone: unitParts[0],
|
||||
slice: unitParts[1],
|
||||
task: unitParts.slice(2).join("/") || undefined,
|
||||
},
|
||||
source: "detector",
|
||||
},
|
||||
s.basePath,
|
||||
);
|
||||
ctx.ui.notify(decision.reason, "error");
|
||||
return;
|
||||
}
|
||||
if (decision.action === "pause") {
|
||||
if (getInFlightToolCount() > 0) return;
|
||||
await closeoutUnit(
|
||||
|
|
|
|||
|
|
@ -24,13 +24,13 @@ Write for the roadmap planner. It needs to understand: what exists in the codeba
|
|||
|
||||
## Calibrate Depth
|
||||
|
||||
Read the milestone title, the user's stated intent, and any inlined context above. Ask: does this milestone introduce new technology, span multiple unfamiliar subsystems, or have ambiguous scope? Or is it a focused feature in well-understood territory?
|
||||
**Default to deep research.** Read the milestone title, the user's stated intent, and any inlined context above. Use deep mode unless you can give a concrete one-line justification for downscoping. The cost of light research on a genuinely uncertain milestone (wrong slice boundaries, missed pitfalls, fabricated risk story by the planner downstream) is far greater than the cost of a thorough exploration on a milestone that turned out simple.
|
||||
|
||||
- **Deep research** — new technology, novel architecture, multiple risky integrations, or genuinely ambiguous scope. Explore broadly, look up docs, investigate alternatives. Write the full strategic frame including risks, boundaries, and slice-ordering rationale. This is the default when the milestone is genuinely uncertain.
|
||||
- **Targeted research** — known technology but new to this codebase, or moderate complexity. Explore the relevant areas, check one or two libraries, identify constraints. Skip Comparable Systems if nothing applies.
|
||||
- **Light research** — well-scoped milestone using established patterns already in the codebase. Read the relevant files to confirm the pattern, note constraints, write Summary + Recommendation + Implementation Landscape. A light milestone-research doc can be 30-50 lines. Don't manufacture risks or comparable-systems analysis for work that doesn't have them.
|
||||
- **Deep research (DEFAULT)** — explore broadly, look up docs via DeepWiki/Context7, run multiple web searches for comparable systems, investigate alternatives. Write the full strategic frame including risks, boundaries, comparable systems, and slice-ordering rationale. This is the assumption.
|
||||
- **Targeted research** — choose this only when the milestone is a well-defined feature in a familiar subsystem AND no novel technology is involved. Explore the relevant areas, check 1-2 libraries, identify constraints. Comparable Systems section is still required.
|
||||
- **Light research** — choose this ONLY when the milestone is trivial repetition of a pattern already in the codebase, with no external dependencies and no architectural decisions. State the explicit downscope reason in the Summary section so the planner sees why depth was reduced. A light milestone-research doc can be 30-50 lines.
|
||||
|
||||
An honest "this milestone is straightforward, here's the pattern and slice boundaries" beats a fabricated multi-page exploration for work that doesn't need it.
|
||||
The previous calibration advice "an honest 'straightforward' beats a fabricated multi-page exploration" still holds — but the bar for declaring straightforward is high. If in doubt, go deep. Comparable Systems is MANDATORY for deep and targeted; only light research may omit it (and only with an explicit reason).
|
||||
|
||||
## Steps
|
||||
|
||||
|
|
@ -40,7 +40,7 @@ Research the codebase and relevant technologies. Narrate key findings and surpri
|
|||
3. Explore relevant code. Use native `lsp` first for symbol lookup, references, and cross-file navigation. For small/familiar codebases, use `rg`, `find`, and targeted reads. For large or unfamiliar codebases, use `scout` to build a broad map efficiently before diving in.
|
||||
3a. Use research swarms when the questions fan out cleanly. If the milestone spans 2-3 independent subsystems, dispatch parallel `scout`/`researcher` subagents with separate lenses, then synthesize their findings into one research artifact. Do not swarm one tightly-coupled question; do it inline.
|
||||
4. **Documentation lookup — prefer DeepWiki first.** Use `ask_question` / `read_wiki_structure` / `read_wiki_contents` (DeepWiki) as the default for any GitHub-hosted library or framework — AI-indexed, no free-tier cap. Fall back to `resolve_library` → `get_library_docs` (Context7) for npm/pypi/crates packages DeepWiki doesn't have. **Context7 free tier is capped at 1000 requests/month — spend those on cases DeepWiki can't cover.** Skip both for libraries already used in this codebase.
|
||||
5. **Web search budget:** You have a limited budget of web searches (max ~15 per session). Use them strategically — try DeepWiki → Context7 → web search in that order. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Target 3-5 total web searches for a typical research unit.
|
||||
5. **Web search budget:** You have a budget of up to ~25 web searches per session. Use them strategically — try DeepWiki → Context7 → web search in that order. For deep research, target 8-12 web searches (comparable systems, prior art, library tradeoffs, common pitfalls); for targeted research, target 4-6; for light research, 0-2 is fine. Do NOT repeat the same or similar queries. If a search didn't find what you need, rephrase once or move on. Spend the budget on real questions, not safety nets.
|
||||
6. Use the **Research** output template from the inlined context above — include only sections that have real content
|
||||
7. If `.sf/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors the user may or may not want.
|
||||
8. Call `save_summary` with `milestone_id: {{milestoneId}}`, `artifact_type: "RESEARCH"`, and the full research markdown as `content` — the tool computes the file path and persists to both DB and disk.
|
||||
|
|
|
|||
|
|
@ -64,6 +64,22 @@
|
|||
|
||||
- {{riskThatCouldSurfaceDuringExecution}}
|
||||
|
||||
## Comparable Systems
|
||||
|
||||
<!-- MANDATORY for deep and targeted research. Document 2-3 real systems
|
||||
(open-source projects, papers, or published architectures) that solved a
|
||||
similar problem. Look them up via DeepWiki/Context7/web search. For each:
|
||||
what approach did they take, what tradeoffs did they accept, and what
|
||||
should we steal vs. avoid. Light research may skip this section ONLY if
|
||||
the milestone is trivial repetition of an existing in-repo pattern. Do
|
||||
not fabricate — if you can't find genuine comparables, state that
|
||||
explicitly and explain why none apply. -->
|
||||
|
||||
| System | Approach | Tradeoffs | What to steal / avoid |
|
||||
|--------|----------|-----------|------------------------|
|
||||
| {{system1}} | {{approach1}} | {{tradeoffs1}} | {{stealOrAvoid1}} |
|
||||
| {{system2}} | {{approach2}} | {{tradeoffs2}} | {{stealOrAvoid2}} |
|
||||
|
||||
## Skills Discovered
|
||||
|
||||
<!-- Include when skill discovery found relevant skills. -->
|
||||
|
|
|
|||
|
|
@ -147,17 +147,23 @@ export const UNIT_MANIFESTS = {
|
|||
preferences: "active-only",
|
||||
tools: TOOLS_PLANNING,
|
||||
artifacts: {
|
||||
// Phase 3 migration (#4782): matches today's actual
|
||||
// buildResearchMilestonePrompt inlining order.
|
||||
// #M005-remediation: knowledge resolved as an inline artifact so its
|
||||
// position (between decisions and templates) is preserved by the
|
||||
// composer's declared-order traversal. Graph stays as a computed
|
||||
// artifact (always appended after templates), matching the prior
|
||||
// builder's behavior. Eliminates the manual splice that previously
|
||||
// existed in buildResearchMilestonePrompt.
|
||||
inline: [
|
||||
"milestone-context",
|
||||
"project",
|
||||
"requirements",
|
||||
"decisions",
|
||||
"knowledge",
|
||||
"templates",
|
||||
],
|
||||
excerpt: [],
|
||||
onDemand: [],
|
||||
computed: ["graph"],
|
||||
},
|
||||
maxSystemPromptChars: COMMON_BUDGET_MEDIUM,
|
||||
},
|
||||
|
|
|
|||
|
|
@ -223,6 +223,47 @@ export function evaluateRunawayGuard(
|
|||
) {
|
||||
return { action: "none" };
|
||||
}
|
||||
// Silent-worker-failure detection (#sf-mp8c0arc-vgw8io):
|
||||
// When the final warning has already been sent and the unit has produced
|
||||
// zero tool calls past the elapsed threshold, the worker is not stuck in a
|
||||
// busy loop — its LLM session never produced an assistant message, so the
|
||||
// session JSONL was never written (session-manager.ts:1086-1096 skips
|
||||
// _persist when !hasAssistant). hasMeaningfulGrowth is false because no
|
||||
// tokens are flowing, so the existing pause branch never fires. Escalate
|
||||
// to fail so the dispatcher can retry or transition to blocked instead of
|
||||
// staying in runaway-final-warning-sent indefinitely.
|
||||
if (
|
||||
s.finalWarningSent &&
|
||||
(unitMetrics.toolCalls ?? 0) === 0 &&
|
||||
unitMetrics.elapsedMs > config.elapsedMs
|
||||
) {
|
||||
const reason =
|
||||
`Runaway guard fail ${unitType} ${unitId}: zero tool calls in ` +
|
||||
`${Math.round(unitMetrics.elapsedMs / 1000)}s after final warning — ` +
|
||||
`silent worker failure suspected (LLM session never produced an assistant message).`;
|
||||
return {
|
||||
action: "fail",
|
||||
reason,
|
||||
metadata: {
|
||||
reason,
|
||||
failedAt: now,
|
||||
unitType,
|
||||
unitId,
|
||||
diagnosticTurns: config.diagnosticTurns,
|
||||
warningsSent: s.warningsSent,
|
||||
thresholdReasons: reasons,
|
||||
metrics: unitMetrics,
|
||||
silentFailure: true,
|
||||
thresholds: {
|
||||
toolCallWarning: config.toolCallWarning,
|
||||
tokenWarning: config.tokenWarning,
|
||||
elapsedMs: config.elapsedMs,
|
||||
changedFilesWarning: config.changedFilesWarning,
|
||||
minIntervalMs: config.minIntervalMs,
|
||||
},
|
||||
},
|
||||
};
|
||||
}
|
||||
if (
|
||||
config.hardPause &&
|
||||
s.finalWarningSent &&
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue