singularity-forge

History

Jeremy McSpadden 0e0f47ef9f fix: failure recovery & resume safeguards (all 4 waves) (#956 ) * fix: prevent data loss on crash with atomic writes, file locking, and error handling Wave 1 of failure recovery safeguards: 1. Atomic session file rewrites (tmp+rename) — _rewriteFile() and forkFrom() now use atomicWriteFileSync to prevent session file corruption on crash 2. Atomic auto.lock writes — crash-recovery.ts writeLock() uses tmp+rename so the crash detection system itself can't be corrupted 3. unhandledRejection handler — catches silent process death from unhandled promise rejections in OAuth, extensions, LSP, or MCP connections 4. try/catch in emitToolCall — matches pattern used by emitUserBash, emitContext, and emitToolResult to prevent extension handler crashes from killing the entire agent turn 5. File locking on session appends — prevents concurrent pi instances from interleaving partial JSON lines in session JSONL files using the same proper-lockfile pattern established in auth-storage.ts and settings-manager.ts * fix: add OAuth timeouts, RPC exit detection, and command context guards Wave 2 of failure recovery safeguards: 1. OAuth fetch timeouts — all fetch() calls across all OAuth providers (Anthropic, OpenAI Codex, Google Antigravity, Google Gemini CLI, GitHub Copilot) now have 30-second AbortSignal.timeout() to prevent indefinite hangs when OAuth servers are unresponsive 2. RPC subprocess exit detection — pending requests are now rejected when the agent subprocess exits unexpectedly, preventing indefinite hangs in the RPC client 3. Extension command context guards — default handlers for newSession, fork, navigateTree, switchSession, and reload now throw explicit errors instead of silently returning success when called before bindCommandContext() 4. OAuth error detail preservation — token refresh errors now preserve the original error as `cause` for better diagnostics * fix: resource cleanup, LSP retry, and crash detection on session resume Wave 3 of failure recovery safeguards: 1. Atomic completed-units.json cleanup — milestone completion writes now use tmp+rename pattern for consistency with auto-recovery.ts 2. Bash temp file cleanup — track temp files created for large output and register a process exit handler to clean them up 3. Settings write queue flush on shutdown — call settingsManager.flush() during interactive mode shutdown so queued writes aren't lost 4. LSP initialization retry — wrap getOrCreateClient with up to 2 retries with exponential backoff (1s, 2s) for transient spawn failures 5. Crash detection on session resume — wasInterrupted() checks if last assistant turn had tool calls without results, shows warning on resume * fix: blob garbage collection and LSP debug logging Wave 4 of failure recovery safeguards: 1. Blob garbage collection — BlobStore.gc(referencedHashes) removes orphaned blobs not referenced by any session file, plus totalSize() for monitoring blob directory growth 2. LSP JSON parse error logging — malformed LSP messages are now logged at debug level (when DEBUG env is set) instead of being silently dropped		2026-03-17 16:03:49 -06:00
..
scripts	refactor: extract inline build scripts from package.json to files	2026-03-16 13:34:05 -05:00
src	fix: failure recovery & resume safeguards (all 4 waves) (#956 )	2026-03-17 16:03:49 -06:00
package.json	refactor: extract inline build scripts from package.json to files	2026-03-16 13:34:05 -05:00
pnpm-lock.yaml	fix: type errors in claude-import.ts and marketplace-discovery.ts	2026-03-16 14:46:31 -04:00
tsconfig.json	feat: vendor Pi source into workspace monorepo	2026-03-12 21:55:17 -06:00