fix(headless-triage): 60s no-output watchdog to cap session.prompt hang
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions

When session.prompt() hits the deadlock seam sf-mp8e02m1-zpk903 (Promise
never resolves pre-LLM-dispatch, 0 syscall activity, blocks until outer
abort), the previous triage call had noOutputTimeoutMs=0 — meaning no
fast-fail path. The full 8-minute timeoutMs would burn before the
parent abort fired, wasting 8 minutes of subscription window per stuck
triage attempt.

This adds a 60s no-output watchdog: if no meaningful subagent event
fires for 60s, abort the prompt. Combined with the diagnostic logs in
subagent-runner.ts (commit 67e5ac9db) the operator gets:

  [subagent:triage-decider] phase=session.prompt-entered ...
  [subagent:triage-decider] STUCK phase=session.prompt 10001ms ...
  [forge] triage] apply blocked: triage-decider produced no output for 60000ms
                                                                       ↑ 60s, not 480s

Triage failure stays non-fatal (per the existing handleTriage error
catch in headless.ts:auto-triage path) — the autonomous loop continues
to its main milestone dispatch. Net effect: SF moves forward 8× faster
when the triage deadlock fires.

Doesn't fix the underlying Promise deadlock (still tracked in
sf-mp8e02m1-zpk903 and the new sf-mpmpXXX-... follow-up). This is a
"unblock the autonomous loop now, fix the deadlock later" patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-16 20:58:07 +02:00
parent 67e5ac9db1
commit d80060fec5

View file

@ -443,6 +443,11 @@ async function defaultAgentRunner(
tools: options.tools ?? agent.tools,
}) ?? agent.systemPrompt;
const appendedPrompt = `${composed}\n\n## Task Input\n\n${task}`;
// noOutputTimeoutMs = 60000 (60s): without this, a hung session.prompt
// (sf-mp8e02m1-zpk903 — Promise deadlock pre-LLM-dispatch with 0 syscall
// activity) blocks for the full timeoutMs (8min) before the parent
// abort fires. 60s no-output captures the hang fast enough to keep the
// autonomous loop moving while the root-cause investigation continues.
const result = await runSubagent(
{
systemPrompt: appendedPrompt,
@ -452,7 +457,7 @@ async function defaultAgentRunner(
name: agent.name,
},
task,
{ timeoutMs: DEFAULT_AGENT_TIMEOUT_MS },
{ timeoutMs: DEFAULT_AGENT_TIMEOUT_MS, noOutputTimeoutMs: 60_000 },
);
return {
ok: result.ok,