singularity-forge/docs/TEST-COVERAGE-PLAN.md

# Test Coverage Improvement Plan

**Status**: In progress
**Target**: Increase coverage from 40% (global) to 60%+ for critical paths
**Effort**: 3-4 sessions, ~8 hours total
**Priority**: High (enables confident autonomous dispatch)

## Current Baseline

```
Global thresholds (vitest.config.ts):
  - statements: 40%
  - lines: 40%
  - branches: 20%
  - functions: 20%

Critical paths (already at 60%):
  - src/resources/extensions/sf/auto/**
  - src/resources/extensions/sf/uok/**

Gap: Autonomous dispatch loop (metrics.js, triage, recovery) at 40%
```

## Critical Paths Needing Coverage

### Tier 1 (Highest Impact)

1. **Auto-dispatch loop** (`src/resources/extensions/sf/auto/`)
   - Current: 60% (already meeting target)
   - Critical for: Autonomous task execution, dispatch decisions
   - Tests needed: Edge cases (blocked units, timeouts, recovery)

2. **Metrics & learning** (`src/resources/extensions/sf/metrics.js`)
   - Current: ~35% (needs improvement)
   - Critical for: Model performance tracking, failure analysis
   - Tests needed: Async recording, concurrent metrics, data persistence

3. **Triage & feedback** (`src/resources/extensions/sf/triage-self-feedback.js`)
   - Current: ~30% (needs improvement)
   - Critical for: Self-evolution loop, report application
   - Tests needed: Report classification, auto-fix safety, degradation paths

4. **Recovery & resilience** (`src/resources/extensions/sf/recovery/`)
   - Current: ~25% (critically low)
   - Critical for: Crash recovery, forensics, automatic remediation
   - Tests needed: Partial failures, state corruption, recovery guarantees

### Tier 2 (Medium Impact)

5. **Environment & startup** (`src/env.ts`, `src/loader.ts`)
   - Current: env.ts 100% (newly added), loader.ts ~45%
   - Critical for: Configuration, startup safety
   - Tests needed: Env variable validation, default paths

6. **Promise management** (`src/resources/extensions/sf/promises.js`)
   - Current: ~40%
   - Critical for: Timeout safety, memory leaks
   - Tests needed: Cancellation, timeout behavior, cleanup

7. **State machine** (`src/resources/extensions/sf/auto/phases.js`)
   - Current: ~35%
   - Critical for: FSM correctness, transition safety
   - Tests needed: Property-based testing (see gap-9)

## Implementation Strategy

### Phase 1: Metrics & Triage Hardening (This session)

**Goal**: Increase dispatch loop reliability to 60%+

1. **Metrics.js coverage:**
   - Add tests for async recordUnitOutcome with model-learner integration
   - Test fire-and-forget error handling (model failures don't block dispatch)
   - Test concurrent metric recording (no race conditions)
   - Verify data persistence (JSON write atomicity)

2. **Triage coverage:**
   - Add tests for auto-fix report classification
   - Test confidence threshold logic (80-95% range)
   - Test graceful degradation (fixes don't break on error)
   - Verify async applyTriageReport doesn't block unit dispatch

**Files to modify**:
  - `src/resources/extensions/sf/metrics.test.ts` (create)
  - `src/resources/extensions/sf/triage-self-feedback.test.ts` (create)

**Estimated effort**: 2-3 hours

### Phase 2: Recovery Path Hardening (Next session)

**Goal**: Ensure crash recovery and forensics work under degradation

1. **Recovery.js coverage:**
   - Test recovery with corrupted state files
   - Test forensics collection under stress
   - Test cleanup operations (branch/snapshot removal)
   - Test partial recovery (recovery fails halfway)

2. **Crash log analysis:**
   - Test crash pattern detection
   - Test recommendation generation
   - Test multi-instance crash correlation

**Estimated effort**: 2-3 hours

### Phase 3: State Machine & Property-Based Testing (Next session)

**Goal**: Guarantee FSM correctness under arbitrary conditions

1. **Phases.js hardening:**
   - Add property-based tests with fast-check
   - Generate arbitrary state transitions
   - Verify no invalid state combinations
   - Test timeout and failure injection

2. **Auto dispatch FSM:**
   - Generate random unit sequences
   - Verify dispatch always reaches terminal state
   - Test concurrent dispatch (parallel workers)
   - Verify cleanup on failure

**Estimated effort**: 2-3 hours

## Testing Approach

### Unit Tests (Primary)

- Test individual functions in isolation
- Mock external dependencies (filesystem, APIs)
- Focus on behavior contracts (what happens, not how)
- Name format: `<what>_<when>_<expected>`

Example:
```typescript
it('recordUnitOutcome_when_model_learner_fails_continues_dispatch', () => {
  // Fire-and-forget: metric recording failure must not block
  const fakeOutcome = { ...unitOutcome, token_count: NaN };
  expect(() => metrics.recordUnitOutcome(fakeOutcome))
    .not.toThrow();
});
```

### Integration Tests (Secondary)

- Test cross-module interactions
- Use real filesystem (temp directories)
- Verify async behavior and race conditions
- Focus on degradation paths

Example:
```typescript
it('dispatch_when_metrics_storage_unavailable_still_completes_unit', async () => {
  // Scenario: .sf directory not writable
  const unit = await dispatch({ ... });
  expect(unit.status).toBe('done');  // Succeeds despite metrics failure
});
```

### Property-Based Tests (Tertiary)

- Use fast-check for FSM testing
- Generate arbitrary input sequences
- Verify invariants (e.g., "always terminate")
- Catch edge cases humans miss

Example:
```typescript
it('dispatch_maintains_invariant_always_reaches_terminal_state', () => {
  fc.assert(
    fc.property(fc.array(arbitraryUnits()), (units) => {
      const results = units.map(u => dispatch(u));
      return results.every(r => [DONE, FAILED, BLOCKED].includes(r.status));
    })
  );
});
```

## Success Criteria

✅ **Phase 1 complete** when:
- metrics.test.ts and triage-self-feedback.test.ts created
- Both files ≥ 20 tests each
- Coverage for metrics.js ≥ 60%
- Coverage for triage.js ≥ 55%
- All tests passing
- Fire-and-forget behavior verified

✅ **Phase 2 complete** when:
- recovery.test.ts created with ≥ 25 tests
- Crash recovery verified with corrupted state
- Forensics tested under filesystem failure
- Cleanup operations tested atomically

✅ **Phase 3 complete** when:
- Property-based tests added to phases.test.ts
- ≥ 100 property-based test cases
- Fast-check shrinking validates edge cases
- FSM invariants proven

## Files to Create/Modify

```
New files:
  src/resources/extensions/sf/metrics.test.ts        (25 tests, 60% coverage target)
  src/resources/extensions/sf/triage-self-feedback.test.ts (20 tests, 55% coverage target)
  src/resources/extensions/sf/recovery/recovery.test.ts (25 tests, 65% coverage target)
  src/resources/extensions/sf/auto/phases.test.mjs   (property-based tests)

Modified files:
  vitest.config.ts                                    (update thresholds: 50% global, 70% critical)
  .github/workflows/ci.yml                            (enforce coverage in CI)
```

## Risk Mitigation

**Risk**: Coverage tests too slow (current 5-10 min)
- **Mitigation**: Run coverage only in CI, not locally. Use `--no-coverage` for dev.

**Risk**: Fire-and-forget tests flaky (timing-dependent)
- **Mitigation**: Use explicit promises instead of setTimeout. Mock timers with Vitest.

**Risk**: Property-based tests generate too many cases
- **Mitigation**: Use fast-check with seed and shrink limit. Start with 100 cases, increase.

## Timeline

- **Today**: Phase 1 (metrics & triage hardening)
- **Next session**: Phase 2 (recovery paths)
- **Week after**: Phase 3 (property-based FSM tests)
- **Final**: CI gating on 60% thresholds for critical paths

## References

- Current coverage config: `vitest.config.ts` lines 52-80
- Quick wins implementation: `QUICK_WINS_INTEGRATION.md`
- Fire-and-forget pattern: `model-learner.js`, `self-report-fixer.js`
- FSM implementation: `src/resources/extensions/sf/auto/phases.js`
test: add comprehensive Phase 1 coverage for dispatch loop (48 tests) - Add metrics.test.ts: 21 tests for unit outcome recording, model performance tracking, fire-and-forget safety, persistence, error handling - Add triage-self-feedback.test.ts: 27 tests for report classification, confidence thresholds, auto-fix, deduplication, severity categorization, async safety Purpose: Increase coverage of critical autonomous dispatch paths from 40% to 60%+. Covers fire-and-forget patterns (metrics recording and auto-fix application must not block dispatch), concurrent recording safety, graceful degradation on error. Tests validate: ✓ Unit outcome recording without blocking ✓ Per-task-type model performance tracking ✓ Fire-and-forget error handling (metrics/fixes don't break dispatch) ✓ Concurrent metric recording race conditions ✓ Persistence atomicity ✓ Report classification by type/severity ✓ Confidence thresholds (0.85-0.95 per type) ✓ Auto-fix deduplication and prioritization ✓ Async triage without blocking dispatch Phase 1 complete: 48 tests, all passing. Phase 2: Recovery path hardening (recovery/forensics) Phase 3: Property-based FSM testing (fast-check) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> 2026-05-07 00:38:19 +02:00			`# Test Coverage Improvement Plan`

			`Status: In progress`
			`Target: Increase coverage from 40% (global) to 60%+ for critical paths`
			`Effort: 3-4 sessions, ~8 hours total`
			`Priority: High (enables confident autonomous dispatch)`

			`## Current Baseline`

			```
			`Global thresholds (vitest.config.ts):`
			`- statements: 40%`
			`- lines: 40%`
			`- branches: 20%`
			`- functions: 20%`

			`Critical paths (already at 60%):`
			`- src/resources/extensions/sf/auto/**`
			`- src/resources/extensions/sf/uok/**`

			`Gap: Autonomous dispatch loop (metrics.js, triage, recovery) at 40%`
			```

			`## Critical Paths Needing Coverage`

			`### Tier 1 (Highest Impact)`

			1. Auto-dispatch loop (`src/resources/extensions/sf/auto/`)
			`- Current: 60% (already meeting target)`
			`- Critical for: Autonomous task execution, dispatch decisions`
			`- Tests needed: Edge cases (blocked units, timeouts, recovery)`

			2. Metrics & learning (`src/resources/extensions/sf/metrics.js`)
			`- Current: ~35% (needs improvement)`
			`- Critical for: Model performance tracking, failure analysis`
			`- Tests needed: Async recording, concurrent metrics, data persistence`

			3. Triage & feedback (`src/resources/extensions/sf/triage-self-feedback.js`)
			`- Current: ~30% (needs improvement)`
			`- Critical for: Self-evolution loop, report application`
			`- Tests needed: Report classification, auto-fix safety, degradation paths`

			4. Recovery & resilience (`src/resources/extensions/sf/recovery/`)
			`- Current: ~25% (critically low)`
			`- Critical for: Crash recovery, forensics, automatic remediation`
			`- Tests needed: Partial failures, state corruption, recovery guarantees`

			`### Tier 2 (Medium Impact)`

			5. Environment & startup (`src/env.ts`, `src/loader.ts`)
			`- Current: env.ts 100% (newly added), loader.ts ~45%`
			`- Critical for: Configuration, startup safety`
			`- Tests needed: Env variable validation, default paths`

			6. Promise management (`src/resources/extensions/sf/promises.js`)
			`- Current: ~40%`
			`- Critical for: Timeout safety, memory leaks`
			`- Tests needed: Cancellation, timeout behavior, cleanup`

			7. State machine (`src/resources/extensions/sf/auto/phases.js`)
			`- Current: ~35%`
			`- Critical for: FSM correctness, transition safety`
			`- Tests needed: Property-based testing (see gap-9)`

			`## Implementation Strategy`

			`### Phase 1: Metrics & Triage Hardening (This session)`

			`Goal: Increase dispatch loop reliability to 60%+`

			`1. Metrics.js coverage:`
			`- Add tests for async recordUnitOutcome with model-learner integration`
			`- Test fire-and-forget error handling (model failures don't block dispatch)`
			`- Test concurrent metric recording (no race conditions)`
			`- Verify data persistence (JSON write atomicity)`

			`2. Triage coverage:`
			`- Add tests for auto-fix report classification`
			`- Test confidence threshold logic (80-95% range)`
			`- Test graceful degradation (fixes don't break on error)`
			`- Verify async applyTriageReport doesn't block unit dispatch`

			`Files to modify:`
			- `src/resources/extensions/sf/metrics.test.ts` (create)
			- `src/resources/extensions/sf/triage-self-feedback.test.ts` (create)

			`Estimated effort: 2-3 hours`

			`### Phase 2: Recovery Path Hardening (Next session)`

			`Goal: Ensure crash recovery and forensics work under degradation`

			`1. Recovery.js coverage:`
			`- Test recovery with corrupted state files`
			`- Test forensics collection under stress`
			`- Test cleanup operations (branch/snapshot removal)`
			`- Test partial recovery (recovery fails halfway)`

			`2. Crash log analysis:`
			`- Test crash pattern detection`
			`- Test recommendation generation`
			`- Test multi-instance crash correlation`

			`Estimated effort: 2-3 hours`

			`### Phase 3: State Machine & Property-Based Testing (Next session)`

			`Goal: Guarantee FSM correctness under arbitrary conditions`

			`1. Phases.js hardening:`
			`- Add property-based tests with fast-check`
			`- Generate arbitrary state transitions`
			`- Verify no invalid state combinations`
			`- Test timeout and failure injection`

			`2. Auto dispatch FSM:`
			`- Generate random unit sequences`
			`- Verify dispatch always reaches terminal state`
			`- Test concurrent dispatch (parallel workers)`
			`- Verify cleanup on failure`

			`Estimated effort: 2-3 hours`

			`## Testing Approach`

			`### Unit Tests (Primary)`

			`- Test individual functions in isolation`
			`- Mock external dependencies (filesystem, APIs)`
			`- Focus on behavior contracts (what happens, not how)`
			- Name format: `<what>_<when>_<expected>`

			`Example:`
			```typescript
			`it('recordUnitOutcome_when_model_learner_fails_continues_dispatch', () => {`
			`// Fire-and-forget: metric recording failure must not block`
			`const fakeOutcome = { ...unitOutcome, token_count: NaN };`
			`expect(() => metrics.recordUnitOutcome(fakeOutcome))`
			`.not.toThrow();`
			`});`
			```

			`### Integration Tests (Secondary)`

			`- Test cross-module interactions`
			`- Use real filesystem (temp directories)`
			`- Verify async behavior and race conditions`
			`- Focus on degradation paths`

			`Example:`
			```typescript
			`it('dispatch_when_metrics_storage_unavailable_still_completes_unit', async () => {`
			`// Scenario: .sf directory not writable`
			`const unit = await dispatch({ ... });`
			`expect(unit.status).toBe('done'); // Succeeds despite metrics failure`
			`});`
			```

			`### Property-Based Tests (Tertiary)`

			`- Use fast-check for FSM testing`
			`- Generate arbitrary input sequences`
			`- Verify invariants (e.g., "always terminate")`
			`- Catch edge cases humans miss`

			`Example:`
			```typescript
			`it('dispatch_maintains_invariant_always_reaches_terminal_state', () => {`
			`fc.assert(`
			`fc.property(fc.array(arbitraryUnits()), (units) => {`
			`const results = units.map(u => dispatch(u));`
			`return results.every(r => [DONE, FAILED, BLOCKED].includes(r.status));`
			`})`
			`);`
			`});`
			```

			`## Success Criteria`

			`✅ Phase 1 complete when:`
			`- metrics.test.ts and triage-self-feedback.test.ts created`
			`- Both files ≥ 20 tests each`
			`- Coverage for metrics.js ≥ 60%`
			`- Coverage for triage.js ≥ 55%`
			`- All tests passing`
			`- Fire-and-forget behavior verified`

			`✅ Phase 2 complete when:`
			`- recovery.test.ts created with ≥ 25 tests`
			`- Crash recovery verified with corrupted state`
			`- Forensics tested under filesystem failure`
			`- Cleanup operations tested atomically`

			`✅ Phase 3 complete when:`
			`- Property-based tests added to phases.test.ts`
			`- ≥ 100 property-based test cases`
			`- Fast-check shrinking validates edge cases`
			`- FSM invariants proven`

			`## Files to Create/Modify`

			```
			`New files:`
			`src/resources/extensions/sf/metrics.test.ts (25 tests, 60% coverage target)`
			`src/resources/extensions/sf/triage-self-feedback.test.ts (20 tests, 55% coverage target)`
			`src/resources/extensions/sf/recovery/recovery.test.ts (25 tests, 65% coverage target)`
			`src/resources/extensions/sf/auto/phases.test.mjs (property-based tests)`

			`Modified files:`
			`vitest.config.ts (update thresholds: 50% global, 70% critical)`
			`.github/workflows/ci.yml (enforce coverage in CI)`
			```

			`## Risk Mitigation`

			`Risk: Coverage tests too slow (current 5-10 min)`
			- Mitigation: Run coverage only in CI, not locally. Use `--no-coverage` for dev.

			`Risk: Fire-and-forget tests flaky (timing-dependent)`
			`- Mitigation: Use explicit promises instead of setTimeout. Mock timers with Vitest.`

			`Risk: Property-based tests generate too many cases`
			`- Mitigation: Use fast-check with seed and shrink limit. Start with 100 cases, increase.`

			`## Timeline`

			`- Today: Phase 1 (metrics & triage hardening)`
			`- Next session: Phase 2 (recovery paths)`
			`- Week after: Phase 3 (property-based FSM tests)`
			`- Final: CI gating on 60% thresholds for critical paths`

			`## References`

			- Current coverage config: `vitest.config.ts` lines 52-80
			- Quick wins implementation: `QUICK_WINS_INTEGRATION.md`
			- Fire-and-forget pattern: `model-learner.js`, `self-report-fixer.js`
			- FSM implementation: `src/resources/extensions/sf/auto/phases.js`