singularity-forge/packages
Mikael Hugo f8e53840da fix(rpc, web): integrate drain into forceShutdown + healthz-503 on shutdown
Three fixes addressing codex's adversarial review of the earlier orphan-
recovery / graceful-shutdown landing:

(1) Codex point B — single shutdown path. Removed the parallel
    installGracefulShutdown() handler in rpc-mode.ts that was adding
    a second SIGTERM listener and racing forceShutdown()'s teardown.
    The drain is now the FIRST step inside forceShutdown() (before
    killTrackedDetachedChildren / extension session_shutdown / etc.)
    so DB writes complete cleanly while child processes are still
    alive to flush. Race-free against the existing shutdown ordering.

(2) Codex point D — recovery-before-each-drain. Cloud-volume mtime
    visibility lags between containers can mean an orphan `.draining`
    file from a previous container isn't visible during the startup
    scan but appears moments later. drainQueuedSfFeedbackCommands()
    now runs recoverOrphanedFeedbackDrains() as its first step, so
    each dispatch's drain sees the latest filesystem state.

(3) Codex point E — healthz returns 503 during shutdown. New module
    src/web/shutdown-state.ts holds a per-process flag, auto-registers
    SIGTERM/SIGINT/SIGHUP handlers on first read, and exposes a
    snapshot (signal, startedAt, elapsedMs) for diagnostics. The
    healthz route imports isShuttingDown() and returns 503 when set,
    so k8s readinessProbe / Forgejo blue-green probes drain traffic
    BEFORE we actually stop responding.

Tests:
  - rpc-mode-orphan-recovery.test.ts: 8/8 still green
  - web-shutdown-state.test.ts: 5/5 new — default false, mark sets
    flag, idempotent, signal exposed via snapshot, null signal for
    manual mark

Deferred to a follow-up commit (codex didn't flag, but noted for
completeness): a SIGTERM-drain child-process integration test that
spawns rpc-mode + sends a real signal. The 5 unit tests cover the
flag logic; the integration test would cover the full process tree
and is bulkier than the current commit warrants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:35:50 +02:00
..
agent-core
ai
coding-agent fix(rpc, web): integrate drain into forceShutdown + healthz-503 on shutdown 2026-05-17 22:35:50 +02:00
daemon
google-gemini-cli-provider
native
openai-codex-provider
rpc-client
tui