singularity-forge/packages/pi-coding-agent/src
Jeremy McSpadden 0e0f47ef9f fix: failure recovery & resume safeguards (all 4 waves) (#956)
* fix: prevent data loss on crash with atomic writes, file locking, and error handling

Wave 1 of failure recovery safeguards:

1. Atomic session file rewrites (tmp+rename) — _rewriteFile() and forkFrom()
   now use atomicWriteFileSync to prevent session file corruption on crash
2. Atomic auto.lock writes — crash-recovery.ts writeLock() uses tmp+rename
   so the crash detection system itself can't be corrupted
3. unhandledRejection handler — catches silent process death from unhandled
   promise rejections in OAuth, extensions, LSP, or MCP connections
4. try/catch in emitToolCall — matches pattern used by emitUserBash,
   emitContext, and emitToolResult to prevent extension handler crashes
   from killing the entire agent turn
5. File locking on session appends — prevents concurrent pi instances from
   interleaving partial JSON lines in session JSONL files using the same
   proper-lockfile pattern established in auth-storage.ts and settings-manager.ts

* fix: add OAuth timeouts, RPC exit detection, and command context guards

Wave 2 of failure recovery safeguards:

1. OAuth fetch timeouts — all fetch() calls across all OAuth providers
   (Anthropic, OpenAI Codex, Google Antigravity, Google Gemini CLI,
   GitHub Copilot) now have 30-second AbortSignal.timeout() to prevent
   indefinite hangs when OAuth servers are unresponsive
2. RPC subprocess exit detection — pending requests are now rejected
   when the agent subprocess exits unexpectedly, preventing indefinite
   hangs in the RPC client
3. Extension command context guards — default handlers for newSession,
   fork, navigateTree, switchSession, and reload now throw explicit
   errors instead of silently returning success when called before
   bindCommandContext()
4. OAuth error detail preservation — token refresh errors now preserve
   the original error as `cause` for better diagnostics

* fix: resource cleanup, LSP retry, and crash detection on session resume

Wave 3 of failure recovery safeguards:

1. Atomic completed-units.json cleanup — milestone completion writes
   now use tmp+rename pattern for consistency with auto-recovery.ts
2. Bash temp file cleanup — track temp files created for large output
   and register a process exit handler to clean them up
3. Settings write queue flush on shutdown — call settingsManager.flush()
   during interactive mode shutdown so queued writes aren't lost
4. LSP initialization retry — wrap getOrCreateClient with up to 2 retries
   with exponential backoff (1s, 2s) for transient spawn failures
5. Crash detection on session resume — wasInterrupted() checks if last
   assistant turn had tool calls without results, shows warning on resume

* fix: blob garbage collection and LSP debug logging

Wave 4 of failure recovery safeguards:

1. Blob garbage collection — BlobStore.gc(referencedHashes) removes
   orphaned blobs not referenced by any session file, plus totalSize()
   for monitoring blob directory growth
2. LSP JSON parse error logging — malformed LSP messages are now logged
   at debug level (when DEBUG env is set) instead of being silently dropped
2026-03-17 16:03:49 -06:00
..
cli feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
core fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
modes fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
resources/extensions/memory fix: Phase 1 quick wins — bug fixes, security hardening, and performance 2026-03-16 13:18:02 -05:00
tests fix: normalize Windows paths in LLM-visible text to prevent bash failures (#874) (#884) 2026-03-17 09:02:23 -06:00
utils fix: normalize Windows paths in LLM-visible text to prevent bash failures (#874) (#884) 2026-03-17 09:02:23 -06:00
cli.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
config.ts perf: fix synchronous I/O in hot paths (#540) 2026-03-15 16:57:22 -06:00
index.ts fix: normalize Windows paths in LLM-visible text to prevent bash failures (#874) (#884) 2026-03-17 09:02:23 -06:00
main.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
migrations.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00