singularity-forge/src
Mikael Hugo f8e53840da fix(rpc, web): integrate drain into forceShutdown + healthz-503 on shutdown
Three fixes addressing codex's adversarial review of the earlier orphan-
recovery / graceful-shutdown landing:

(1) Codex point B — single shutdown path. Removed the parallel
    installGracefulShutdown() handler in rpc-mode.ts that was adding
    a second SIGTERM listener and racing forceShutdown()'s teardown.
    The drain is now the FIRST step inside forceShutdown() (before
    killTrackedDetachedChildren / extension session_shutdown / etc.)
    so DB writes complete cleanly while child processes are still
    alive to flush. Race-free against the existing shutdown ordering.

(2) Codex point D — recovery-before-each-drain. Cloud-volume mtime
    visibility lags between containers can mean an orphan `.draining`
    file from a previous container isn't visible during the startup
    scan but appears moments later. drainQueuedSfFeedbackCommands()
    now runs recoverOrphanedFeedbackDrains() as its first step, so
    each dispatch's drain sees the latest filesystem state.

(3) Codex point E — healthz returns 503 during shutdown. New module
    src/web/shutdown-state.ts holds a per-process flag, auto-registers
    SIGTERM/SIGINT/SIGHUP handlers on first read, and exposes a
    snapshot (signal, startedAt, elapsedMs) for diagnostics. The
    healthz route imports isShuttingDown() and returns 503 when set,
    so k8s readinessProbe / Forgejo blue-green probes drain traffic
    BEFORE we actually stop responding.

Tests:
  - rpc-mode-orphan-recovery.test.ts: 8/8 still green
  - web-shutdown-state.test.ts: 5/5 new — default false, mark sets
    flag, idempotent, signal exposed via snapshot, null signal for
    manual mark

Deferred to a follow-up commit (codex didn't flag, but noted for
completeness): a SIGTERM-drain child-process integration test that
spawns rpc-mode + sends a real signal. The 5 unit tests cover the
flag logic; the integration test would cover the full process tree
and is bulkier than the current commit warrants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:35:50 +02:00
..
resources fix(sf-db): share open-DB state across module instances via globalThis 2026-05-17 21:47:01 +02:00
tests fix(rpc, web): integrate drain into forceShutdown + healthz-503 on shutdown 2026-05-17 22:35:50 +02:00
web fix(rpc, web): integrate drain into forceShutdown + healthz-503 on shutdown 2026-05-17 22:35:50 +02:00
app-paths.ts feat: make sf server the operator entrypoint 2026-05-17 17:23:46 +02:00
bundled-extension-paths.ts
bundled-resource-path.ts
claude-cli-check.ts
cli-key.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
cli-logs.ts sf snapshot: uncommitted changes after 93m inactivity 2026-05-06 11:37:27 +02:00
cli-stats.ts style: format repository with biome 2026-05-05 14:31:16 +02:00
cli-status.ts feat(notifications): NOTICE_KIND enum, schema v2 dedup, sf-db cleanup 2026-05-10 20:13:58 +02:00
cli-web-branch.ts fix: harden sf server control loop 2026-05-17 21:13:12 +02:00
cli.ts feat: make sf server the operator entrypoint 2026-05-17 17:23:46 +02:00
env.ts fix(env): align SF_PERMISSION_LEVEL enum with permission-profile values 2026-05-14 21:11:36 +02:00
errors.ts
extension-discovery.ts fix: consolidate extensions into sf, migrate kernel.ts, fix test suite 2026-05-11 02:40:52 +02:00
extension-registry.ts
headless-answers.ts remove A2A; swarm enrollment + status projection + web swarms view; headless refactor 2026-05-17 16:04:06 +02:00
headless-context.ts refactor: align agent resource overlays 2026-05-14 19:32:41 +02:00
headless-events.ts fix(headless): do not restart graceful child exits 2026-05-15 07:25:06 +02:00
headless-feedback.ts fix: auto-version-bump swallowed operator-direction; ptrmap + lock guards 2026-05-17 15:51:36 +02:00
headless-import-backlog.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
headless-mark-state.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
headless-query.ts remove A2A; swarm enrollment + status projection + web swarms view; headless refactor 2026-05-17 16:04:06 +02:00
headless-reflect.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
headless-server-forward.ts fix: harden sf server control loop 2026-05-17 21:13:12 +02:00
headless-status.ts fix(headless): bypass rpc for status 2026-05-15 17:32:21 +02:00
headless-triage.ts fix: harden sf server control loop 2026-05-17 21:13:12 +02:00
headless-types.ts feat(notifications): NOTICE_KIND enum, schema v2 dedup, sf-db cleanup 2026-05-10 20:13:58 +02:00
headless-ui.ts fold: hashline_edit + hashline_read → Edit({match}) + Read({format}) modes 2026-05-17 17:39:59 +02:00
headless-uok-status.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
headless-usage.ts chore: formatter / linter touch-up (230 files) 2026-05-16 21:19:53 +02:00
headless.ts fix: harden sf server control loop 2026-05-17 21:13:12 +02:00
help-text.ts refactor(sf): separate daemon from server identity 2026-05-17 19:18:33 +02:00
interactive-session-lock.ts fix: enforce one interactive sf per repo 2026-05-05 20:55:53 +02:00
loader.ts fix(lint): reformat 6 files touched during web dep upgrade 2026-05-10 12:10:10 +02:00
logger.ts refactor: rename pi-* packages to forge-native names (Phase 1) 2026-05-10 11:28:01 +02:00
logo.ts
models-resolver.ts
onboarding.ts remove: SF voice IVR / ElevenLabs paging — migrated to centralcloud 2026-05-17 17:42:16 +02:00
pi-migration.ts refactor: rename pi-* packages to forge-native names (Phase 1) 2026-05-10 11:28:01 +02:00
project-sessions.ts
provider-migrations.ts refactor: rename pi-* packages to forge-native names (Phase 1) 2026-05-10 11:28:01 +02:00
remote-questions-config.ts
resource-loader.ts fix: repair headless runtime self-healing 2026-05-15 03:33:29 +02:00
rtk.ts feat: replace launchd with systemd user-unit install path 2026-05-17 17:33:34 +02:00
security-overrides.ts refactor: rename pi-* packages to forge-native names (Phase 1) 2026-05-10 11:28:01 +02:00
startup-model-validation.ts chore: commit current workspace state 2026-05-05 14:46:18 +02:00
startup-timings.ts
status-projection.ts remove A2A; swarm enrollment + status projection + web swarms view; headless refactor 2026-05-17 16:04:06 +02:00
tool-bootstrap.ts
traces.ts sf snapshot: uncommitted changes after 49m inactivity 2026-05-08 01:07:24 +02:00
update-check.ts fix: clean provider surfaces and core build 2026-05-05 16:31:53 +02:00
update-cmd.ts fix: clean provider surfaces and core build 2026-05-05 16:31:53 +02:00
web-mode.ts fix: harden sf server control loop 2026-05-17 21:13:12 +02:00
welcome-screen.ts fix: update test snapshots for queryInstruction and complete /sf prefix Phase 2 deprecation 2026-05-09 00:17:47 +02:00
wizard.ts refactor: rename pi-* packages to forge-native names (Phase 1) 2026-05-10 11:28:01 +02:00
worktree-cli.ts sf snapshot: uncommitted changes after 43m inactivity 2026-05-05 21:39:56 +02:00
worktree-name-gen.ts