fix(headless-triage): 60s no-output watchdog to cap session.prompt hang

When session.prompt() hits the deadlock seam sf-mp8e02m1-zpk903 (Promise never resolves pre-LLM-dispatch, 0 syscall activity, blocks until outer abort), the previous triage call had noOutputTimeoutMs=0 — meaning no fast-fail path. The full 8-minute timeoutMs would burn before the parent abort fired, wasting 8 minutes of subscription window per stuck triage attempt. This adds a 60s no-output watchdog: if no meaningful subagent event fires for 60s, abort the prompt. Combined with the diagnostic logs in subagent-runner.ts (commit 67e5ac9db) the operator gets: [subagent:triage-decider] phase=session.prompt-entered ... [subagent:triage-decider] STUCK phase=session.prompt 10001ms ... [forge] triage] apply blocked: triage-decider produced no output for 60000ms ↑ 60s, not 480s Triage failure stays non-fatal (per the existing handleTriage error catch in headless.ts:auto-triage path) — the autonomous loop continues to its main milestone dispatch. Net effect: SF moves forward 8× faster when the triage deadlock fires. Doesn't fix the underlying Promise deadlock (still tracked in sf-mp8e02m1-zpk903 and the new sf-mpmpXXX-... follow-up). This is a "unblock the autonomous loop now, fix the deadlock later" patch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:58:07 +02:00 · 2026-05-16 20:58:07 +02:00 · d80060fec5
commit d80060fec5
parent 67e5ac9db1
1 changed files with 6 additions and 1 deletions
--- a/src/headless-triage.ts
+++ b/src/headless-triage.ts
@ -443,6 +443,11 @@ async function defaultAgentRunner(
 			tools: options.tools ?? agent.tools,
 		}) ?? agent.systemPrompt;
 	const appendedPrompt = `${composed}\n\n## Task Input\n\n${task}`;
+	// noOutputTimeoutMs = 60000 (60s): without this, a hung session.prompt
+	// (sf-mp8e02m1-zpk903 — Promise deadlock pre-LLM-dispatch with 0 syscall
+	// activity) blocks for the full timeoutMs (8min) before the parent
+	// abort fires. 60s no-output captures the hang fast enough to keep the
+	// autonomous loop moving while the root-cause investigation continues.
 	const result = await runSubagent(
 		{
 			systemPrompt: appendedPrompt,
@ -452,7 +457,7 @@ async function defaultAgentRunner(
 			name: agent.name,
 		},
 		task,
-		{ timeoutMs: DEFAULT_AGENT_TIMEOUT_MS },
+		{ timeoutMs: DEFAULT_AGENT_TIMEOUT_MS, noOutputTimeoutMs: 60_000 },
 	);
 	return {
 		ok: result.ok,