Address adversarial review findings:
1. Timed-out pre/post verification continues running in background and
can mutate s.currentUnit for the wrong unit. Fix: null out
s.currentUnit on timeout so late async completions are harmless
(all side effects in postUnitPreVerification guard on s.currentUnit).
2. Finalize timeouts were treated as successful iterations, resetting
consecutiveErrors and enabling silent infinite churn. Fix: add
consecutiveFinalizeTimeouts counter to LoopState, increment on each
timeout, hard-stop auto-mode after MAX_FINALIZE_TIMEOUTS (3)
consecutive timeouts. Reset to 0 on successful finalize.
Both fixes apply symmetrically to pre and post verification timeouts.
Refs #3757
postUnitPostVerification already has a 60s timeout guard (#2344) but
postUnitPreVerification was called with bare await — if any async
operation inside it never resolves (browser teardown, worktree sync,
safety harness validation), the auto-loop freezes permanently with no
error, notification, or recovery.
Wrap postUnitPreVerification in the same withTimeout() pattern with a
dedicated FINALIZE_PRE_TIMEOUT_MS constant. On timeout, log a warning
and force-continue to the next iteration.
Closes#3757
The tool_result handler called markDepthVerified() whenever
ask_user_questions returned any response with a depth_verification
question ID — without checking what the user actually selected.
Selecting "Not quite", "None of the above", or garbage input all
unlocked the gate.
- Extract isDepthConfirmationAnswer() into write-gate.ts with structural
validation: cross-references selected answer against the question's
defined options, only accepting an exact match of the first option
(confirmation by convention). Rejects free-form "Other" text and
decouples from any specific label substring.
- Harden block message with explicit anti-bypass language
- Add anti-bypass instructions to all three discuss prompts
- Add 8 new tests covering: structural validation, free-form bypass
rejection, label-drift resilience, fallback behavior, edge cases
Closes#3749
The function signature changed from boolean to "present" | "absent" |
"unknown" but three test assertions still compared against true/false.
Update assertions to match the new return type.
Blocking on "unknown" from hasImplementationArtifacts broke real-world
auto-mode in projects without clean git merge-bases (single-branch,
fresh repos, detached HEAD). The auto-loop silently stopped at
completing-milestone with no visible error.
Reverted to warn-and-proceed for "unknown" — only "absent" (confirmed
no implementation files) blocks completion. This matches the original
fail-open behavior for inconclusive git checks.
1. hasImplementationArtifacts "unknown" now blocks completion instead of
warn-and-proceed. Both auto-dispatch.ts and auto-recovery.ts updated
to treat "unknown" as a stop condition, preventing milestone completion
when git status cannot be verified.
2. Audit log SAFE_KEYS allowlist expanded to include "id", "error", and
"count" fields. SPLIT BRAIN logError entries now persist the entity ID
and rollback error details to audit-log.jsonl for triage/repair.
Adds a pre-write guard in reconcileWorktreeLogs: re-reads the event
log before overwriting and retries if it grew since the initial read.
Prevents appendEvent calls between read and rewrite from being silently
dropped by the atomic overwrite.
1. Paused session file deletion deferred until after lock acquisition.
Previously the file was deleted before acquireSessionLock — if the
lock failed, the pause metadata was lost on disk and in memory,
making the session unresumable. Now the file path is stored in
s.pausedSessionFile and only deleted after successful lock.
2. Lock failure path preserves pause file for retry.
1. plan_task and plan_slice replay now use strict INSERT OR IGNORE
instead of calling insertTask/insertSlice which use ON CONFLICT
DO UPDATE. Prevents replay of older plan events from downgrading
progressed task/slice status back to pending.
2. Type guard on cmd normalization: non-string cmd values are skipped
with a warning instead of throwing.
3. Type guard on extractEntityKey for consistency.
1. Type guard on cmd normalization: non-string cmd values are now
skipped with a warning instead of throwing, preventing replay
from crashing on malformed event lines.
2. complete_milestone replay now validates all slices are closed
before marking milestone complete. Prevents a reordered/partial
event stream from closing a milestone with incomplete work.
3. Type guard on extractEntityKey cmd normalization for consistency.
Addresses Codex adversarial review findings:
1. Migration backup now flushes WAL via PRAGMA wal_checkpoint(TRUNCATE)
before copyFileSync. Without this, the backup could miss committed
data that only exists in the -wal file. Backup failure is now logged
via logWarning instead of silently swallowed.
2. Wave 5 regression tests strengthened:
- Added behavior-level test for skipped/blocked/pending status mapping
to checkbox rendering (not just isClosedStatus helper)
- Added extractEntityKey round-trip tests for underscored cmd formats
- Added unknown cmd → null safety test
Tests isClosedStatus coverage for projections, upsertDecision seq
preservation (ON CONFLICT DO UPDATE vs INSERT OR REPLACE), and
event schema versioning (v:2 field in new events).
Adds tests for plan event entity key extraction and unknown cmd handling.
Fixes empty catch blocks in auto-recovery.ts appendEvent calls that failed
the "no empty catch blocks" CI lint.
Covers event log cmd format normalization (hyphens + underscores),
extractEntityKey for complete-milestone, and isClosedStatus
including skipped status.
Five consistency fixes to eliminate divergence sources:
1. workflow-projections.ts: Direct string comparisons for task/slice status
replaced with isClosedStatus() from status-guards.ts. Skipped tasks now
correctly show checked checkboxes in PLAN.md and ROADMAP.md.
2. gsd-db.ts upsertDecision: INSERT OR REPLACE changed to INSERT ... ON
CONFLICT(id) DO UPDATE SET. Preserves the seq column so decision ordering
in DECISIONS.md is stable after reconcile replay.
3. state.ts: Duplicate private isStatusDone() removed, replaced with alias
to isClosedStatus from status-guards.ts. Single source of truth for
"what counts as closed."
4. gsd-db.ts migrateSchema: Database is now backed up to
gsd.db.backup-v{currentVersion} before running migration steps. A mid-
migration crash no longer leaves a partially-migrated DB with no recovery.
5. workflow-events.ts: WorkflowEvent interface now includes optional v field
(schema version). New events are written with v:2. Legacy events (no v
field) are still accepted. Prevents future cmd-format drift from requiring
another dual-read fix.
Three write-safety fixes:
1. json-persistence.ts: Fixed .tmp suffix replaced with randomized suffix
using crypto.randomBytes(4). Prevents concurrent-write data loss when two
callers write the same JSON file simultaneously (metrics ledger at risk
during parallel slice execution).
2. undo.ts: Raw writeFileSync on PLAN.md replaced with atomicWriteSync.
Prevents crash mid-write from corrupting PLAN.md permanently.
3. triage-resolution.ts: All 6 writeFileSync calls replaced with
atomicWriteSync. Covers PLAN.md inject, REPLAN-TRIGGER.md, REGRESSION.md,
and CONTEXT-DRAFT.md writes.
Five fixes for session lifecycle and recovery reliability:
1. hasImplementationArtifacts now returns tri-state ("present"|"absent"|"unknown")
instead of boolean. "unknown" on git errors lets callers warn+proceed instead
of either silently blocking or silently allowing. Both callers updated.
2. DB-ahead-of-disk split-brain: rollback DELETE in db-writer.ts saveDecisionToDb
and saveRequirementToDb now wrapped in try/catch with logError. A failed
rollback is explicitly logged as SPLIT BRAIN so the orphaned row is auditable.
3. _consecutiveCompleteBootstraps moved from module-level in auto-start.ts into
AutoSession class. Now properly reset by s.reset(), preventing cross-session
counter bleed in long-running processes (VS Code extension).
4. s.paused sticky on lock failure: when acquireSessionLock fails during resume,
s.paused is now set back to false so isAutoPaused() doesn't return true
permanently.
5. nativeCommit empty message replaced with "chore(gsd): reconcile merge state"
to avoid rejection by strict git configurations.
Five fixes for event log integrity and worktree reconciliation:
1. writeBlockerPlaceholder now appends event log entries after DB writes
so recovery-path completions are visible to worktree reconciliation.
2. appendEvent failure is no longer silently swallowed in completion tools.
Each post-mutation step (projections, manifest, event log) now has its
own try/catch so a projection failure cannot prevent the event log entry.
Event log failures use logError (persisted to audit-log.jsonl) instead
of logWarning.
3. verification_evidence dedup confirmed already in place — INSERT OR IGNORE
with unique index on (task_id, slice_id, milestone_id, command, verdict).
4. New entity replay handlers added to replayEvents: plan_milestone (creates
milestone via INSERT OR IGNORE), plan_slice (creates slice), plan_task
(creates task), replan_slice (informational no-op). Also added to
extractEntityKey for conflict detection.
5. Post-reconcile cache invalidation added — targeted invalidation
(invalidateStateCache + clearPathCache + clearParseCache) at the end of
reconcileWorktreeLogs so deriveState() sees post-reconcile DB state.
Four critical fixes for the GSD state machine:
1. Event log cmd format mismatch — completion tools write hyphenated cmds
("complete-task") but replayEvents handled only underscored ("complete_task").
Worktree reconciliation replay was completely broken for modern completions.
Fix: normalize cmd via replace(/-/g, "_") in both replayEvents and
extractEntityKey. Also adds complete_milestone replay handler and warns
on unknown commands instead of silently skipping.
2. Dead if-block at state.ts:434-440 — empty block with misleading comments
wasted getMilestoneSlices() + every() computation. Removed and replaced
with clear comment explaining why all-slices-done milestones without
SUMMARY are intentionally not added to completeMilestoneIds.
3. getActiveMilestoneId missing "skipped" status — checked complete/done/parked
but not skipped. isStatusDone() includes skipped, creating divergence where
a skipped milestone could become permanently "active". Fix: use
isClosedStatus() || parked check.
4. executeReplan disk-file fallback — triage-resolution.ts writes replan
trigger to disk and DB (best-effort). If DB write fails, deriveStateFromDb
only checked the DB column, making the trigger invisible. Fix: fall back
to checking the disk REPLAN-TRIGGER file when DB column is null.
Four critical fixes for the GSD state machine:
1. Event log cmd format mismatch — completion tools write hyphenated cmds
("complete-task") but replayEvents handled only underscored ("complete_task").
Worktree reconciliation replay was completely broken for modern completions.
Fix: normalize cmd via replace(/-/g, "_") in both replayEvents and
extractEntityKey. Also adds complete_milestone replay handler and warns
on unknown commands instead of silently skipping.
2. Dead if-block at state.ts:434-440 — empty block with misleading comments
wasted getMilestoneSlices() + every() computation. Removed and replaced
with clear comment explaining why all-slices-done milestones without
SUMMARY are intentionally not added to completeMilestoneIds.
3. getActiveMilestoneId missing "skipped" status — checked complete/done/parked
but not skipped. isStatusDone() includes skipped, creating divergence where
a skipped milestone could become permanently "active". Fix: use
isClosedStatus() || parked check.
4. executeReplan disk-file fallback — triage-resolution.ts writes replan
trigger to disk and DB (best-effort). If DB write fails, deriveStateFromDb
only checked the DB column, making the trigger invisible. Fix: fall back
to checking the disk REPLAN-TRIGGER file when DB column is null.
Ecosystem research: executeSearchQuery() was a stub returning empty
results. Research now happens during the discussion (between Layer 1
and Layer 2) using whatever web search tools are available — native
Anthropic web search for Claude, search-the-web for other providers.
Preparation phase focuses on mechanical work only.
Adversarial review fixes:
- Clear lastPreparationResult on every discuss entry to prevent
cross-session/project state leaks
- Replace invalid JS regex anchor \z with indexOf-based section
extraction in prompt-validation.ts
- Document consecutive error counter finding as upstream behavior
(agent-loop.ts is part of pi-agent-core, not gsd extension)
The Model [phase] [tier] notification fired on every unit dispatch during
auto-mode, cluttering the notification widget. The dashboard header already
displays the active model, making this redundant. Gate behind verbose flag
consistent with all other model routing notifications in the same function.
task.files ("files likely touched") is a planning hint that includes files
a task will create, not a dependency contract. Including it in ordering
checks caused false "sequence violation" blocking errors when a task listed
files it would create. Only task.inputs (machine-parsed prerequisites)
should trigger ordering violations, matching checkFilePathConsistency (#3626).
Closes#3677
- Changed checkTaskOrdering to check [...task.inputs] instead of
[...task.files, ...task.inputs]
- Updated 4 existing tests to use inputs (were testing buggy behavior)
- Added 8 regression tests: 5 ordering false-positive cases,
3 consistency edge cases
The ghost milestone check (#3645) was eliminating queued shell
milestones before the deferred-shell logic (#3470) could handle them,
causing queued milestones to vanish from the registry entirely.