singularity-forge/packages/pi-coding-agent/src/core
Jeremy McSpadden 0e0f47ef9f fix: failure recovery & resume safeguards (all 4 waves) (#956)
* fix: prevent data loss on crash with atomic writes, file locking, and error handling

Wave 1 of failure recovery safeguards:

1. Atomic session file rewrites (tmp+rename) — _rewriteFile() and forkFrom()
   now use atomicWriteFileSync to prevent session file corruption on crash
2. Atomic auto.lock writes — crash-recovery.ts writeLock() uses tmp+rename
   so the crash detection system itself can't be corrupted
3. unhandledRejection handler — catches silent process death from unhandled
   promise rejections in OAuth, extensions, LSP, or MCP connections
4. try/catch in emitToolCall — matches pattern used by emitUserBash,
   emitContext, and emitToolResult to prevent extension handler crashes
   from killing the entire agent turn
5. File locking on session appends — prevents concurrent pi instances from
   interleaving partial JSON lines in session JSONL files using the same
   proper-lockfile pattern established in auth-storage.ts and settings-manager.ts

* fix: add OAuth timeouts, RPC exit detection, and command context guards

Wave 2 of failure recovery safeguards:

1. OAuth fetch timeouts — all fetch() calls across all OAuth providers
   (Anthropic, OpenAI Codex, Google Antigravity, Google Gemini CLI,
   GitHub Copilot) now have 30-second AbortSignal.timeout() to prevent
   indefinite hangs when OAuth servers are unresponsive
2. RPC subprocess exit detection — pending requests are now rejected
   when the agent subprocess exits unexpectedly, preventing indefinite
   hangs in the RPC client
3. Extension command context guards — default handlers for newSession,
   fork, navigateTree, switchSession, and reload now throw explicit
   errors instead of silently returning success when called before
   bindCommandContext()
4. OAuth error detail preservation — token refresh errors now preserve
   the original error as `cause` for better diagnostics

* fix: resource cleanup, LSP retry, and crash detection on session resume

Wave 3 of failure recovery safeguards:

1. Atomic completed-units.json cleanup — milestone completion writes
   now use tmp+rename pattern for consistency with auto-recovery.ts
2. Bash temp file cleanup — track temp files created for large output
   and register a process exit handler to clean them up
3. Settings write queue flush on shutdown — call settingsManager.flush()
   during interactive mode shutdown so queued writes aren't lost
4. LSP initialization retry — wrap getOrCreateClient with up to 2 retries
   with exponential backoff (1s, 2s) for transient spawn failures
5. Crash detection on session resume — wasInterrupted() checks if last
   assistant turn had tool calls without results, shows warning on resume

* fix: blob garbage collection and LSP debug logging

Wave 4 of failure recovery safeguards:

1. Blob garbage collection — BlobStore.gc(referencedHashes) removes
   orphaned blobs not referenced by any session file, plus totalSize()
   for monitoring blob directory growth
2. LSP JSON parse error logging — malformed LSP messages are now logged
   at debug level (when DEBUG env is set) instead of being silently dropped
2026-03-17 16:03:49 -06:00
..
compaction feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
export-html fix: add missing export-html vendor files 2026-03-13 10:38:13 +01:00
extensions fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
lsp fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
tools fix(bash): rewrite background commands to prevent pipe-open hang 2026-03-16 18:03:01 -05:00
agent-session.ts feat: integrate hashline edit mode into active workflow (#870) (#872) 2026-03-17 08:23:53 -06:00
artifact-manager.ts feat: TTSR + blob/artifact storage (ported from oh-my-pi) 2026-03-13 08:43:56 -06:00
auth-storage.test.ts fix: prevent credential backoff on transport errors and handle quota exhaustion gracefully (#353) 2026-03-14 07:15:00 -06:00
auth-storage.ts feat: add cross-provider fallback when rate/quota limits are hit (#125) 2026-03-14 15:45:44 -05:00
bash-executor.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
blob-store.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
defaults.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
diagnostics.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
discovery-cache.test.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
discovery-cache.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
event-bus.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
exec.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
fallback-resolver.test.ts feat: add cross-provider fallback when rate/quota limits are hit (#125) 2026-03-14 15:45:44 -05:00
fallback-resolver.ts feat: add cross-provider fallback when rate/quota limits are hit (#125) 2026-03-14 15:45:44 -05:00
footer-data-provider.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
fs-utils.test.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
fs-utils.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
index.ts feat: add cross-provider fallback when rate/quota limits are hit (#125) 2026-03-14 15:45:44 -05:00
keybindings.ts fix: add Alt+V as clipboard image paste shortcut on macOS and document it (#852) (#854) 2026-03-17 08:19:13 -06:00
messages.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
model-discovery.test.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
model-discovery.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
model-registry-discovery.test.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
model-registry.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
model-resolver.ts feat: default to Opus 4.6 1M context variant (#565) 2026-03-15 18:57:46 -06:00
models-json-writer.test.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
models-json-writer.ts feat: dynamic model discovery & provider management UX (#581) 2026-03-16 06:23:18 -06:00
package-manager.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
prompt-templates.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
resolve-config-value.test.ts fix: Phase 1 quick wins — bug fixes, security hardening, and performance 2026-03-16 13:18:02 -05:00
resolve-config-value.ts fix: Phase 1 quick wins — bug fixes, security hardening, and performance 2026-03-16 13:18:02 -05:00
resource-loader.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
sdk.ts feat: integrate hashline edit mode into active workflow (#870) (#872) 2026-03-17 08:23:53 -06:00
session-manager.test.ts perf: optimize discovery and interactive hot paths 2026-03-14 16:03:44 -05:00
session-manager.ts fix: failure recovery & resume safeguards (all 4 waves) (#956) 2026-03-17 16:03:49 -06:00
settings-manager.ts feat: integrate hashline edit mode into active workflow (#870) (#872) 2026-03-17 08:23:53 -06:00
skills.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00
slash-commands.ts feat: integrate hashline edit mode into active workflow (#870) (#872) 2026-03-17 08:23:53 -06:00
system-prompt.ts fix: normalize Windows paths in LLM-visible text to prevent bash failures (#874) (#884) 2026-03-17 09:02:23 -06:00
timings.ts feat: vendor Pi source into workspace monorepo 2026-03-12 21:55:17 -06:00