238 lines
7.7 KiB
Markdown
238 lines
7.7 KiB
Markdown
|
|
# Test Coverage Improvement Plan
|
||
|
|
|
||
|
|
**Status**: In progress
|
||
|
|
**Target**: Increase coverage from 40% (global) to 60%+ for critical paths
|
||
|
|
**Effort**: 3-4 sessions, ~8 hours total
|
||
|
|
**Priority**: High (enables confident autonomous dispatch)
|
||
|
|
|
||
|
|
## Current Baseline
|
||
|
|
|
||
|
|
```
|
||
|
|
Global thresholds (vitest.config.ts):
|
||
|
|
- statements: 40%
|
||
|
|
- lines: 40%
|
||
|
|
- branches: 20%
|
||
|
|
- functions: 20%
|
||
|
|
|
||
|
|
Critical paths (already at 60%):
|
||
|
|
- src/resources/extensions/sf/auto/**
|
||
|
|
- src/resources/extensions/sf/uok/**
|
||
|
|
|
||
|
|
Gap: Autonomous dispatch loop (metrics.js, triage, recovery) at 40%
|
||
|
|
```
|
||
|
|
|
||
|
|
## Critical Paths Needing Coverage
|
||
|
|
|
||
|
|
### Tier 1 (Highest Impact)
|
||
|
|
|
||
|
|
1. **Auto-dispatch loop** (`src/resources/extensions/sf/auto/`)
|
||
|
|
- Current: 60% (already meeting target)
|
||
|
|
- Critical for: Autonomous task execution, dispatch decisions
|
||
|
|
- Tests needed: Edge cases (blocked units, timeouts, recovery)
|
||
|
|
|
||
|
|
2. **Metrics & learning** (`src/resources/extensions/sf/metrics.js`)
|
||
|
|
- Current: ~35% (needs improvement)
|
||
|
|
- Critical for: Model performance tracking, failure analysis
|
||
|
|
- Tests needed: Async recording, concurrent metrics, data persistence
|
||
|
|
|
||
|
|
3. **Triage & feedback** (`src/resources/extensions/sf/triage-self-feedback.js`)
|
||
|
|
- Current: ~30% (needs improvement)
|
||
|
|
- Critical for: Self-evolution loop, report application
|
||
|
|
- Tests needed: Report classification, auto-fix safety, degradation paths
|
||
|
|
|
||
|
|
4. **Recovery & resilience** (`src/resources/extensions/sf/recovery/`)
|
||
|
|
- Current: ~25% (critically low)
|
||
|
|
- Critical for: Crash recovery, forensics, automatic remediation
|
||
|
|
- Tests needed: Partial failures, state corruption, recovery guarantees
|
||
|
|
|
||
|
|
### Tier 2 (Medium Impact)
|
||
|
|
|
||
|
|
5. **Environment & startup** (`src/env.ts`, `src/loader.ts`)
|
||
|
|
- Current: env.ts 100% (newly added), loader.ts ~45%
|
||
|
|
- Critical for: Configuration, startup safety
|
||
|
|
- Tests needed: Env variable validation, default paths
|
||
|
|
|
||
|
|
6. **Promise management** (`src/resources/extensions/sf/promises.js`)
|
||
|
|
- Current: ~40%
|
||
|
|
- Critical for: Timeout safety, memory leaks
|
||
|
|
- Tests needed: Cancellation, timeout behavior, cleanup
|
||
|
|
|
||
|
|
7. **State machine** (`src/resources/extensions/sf/auto/phases.js`)
|
||
|
|
- Current: ~35%
|
||
|
|
- Critical for: FSM correctness, transition safety
|
||
|
|
- Tests needed: Property-based testing (see gap-9)
|
||
|
|
|
||
|
|
## Implementation Strategy
|
||
|
|
|
||
|
|
### Phase 1: Metrics & Triage Hardening (This session)
|
||
|
|
|
||
|
|
**Goal**: Increase dispatch loop reliability to 60%+
|
||
|
|
|
||
|
|
1. **Metrics.js coverage:**
|
||
|
|
- Add tests for async recordUnitOutcome with model-learner integration
|
||
|
|
- Test fire-and-forget error handling (model failures don't block dispatch)
|
||
|
|
- Test concurrent metric recording (no race conditions)
|
||
|
|
- Verify data persistence (JSON write atomicity)
|
||
|
|
|
||
|
|
2. **Triage coverage:**
|
||
|
|
- Add tests for auto-fix report classification
|
||
|
|
- Test confidence threshold logic (80-95% range)
|
||
|
|
- Test graceful degradation (fixes don't break on error)
|
||
|
|
- Verify async applyTriageReport doesn't block unit dispatch
|
||
|
|
|
||
|
|
**Files to modify**:
|
||
|
|
- `src/resources/extensions/sf/metrics.test.ts` (create)
|
||
|
|
- `src/resources/extensions/sf/triage-self-feedback.test.ts` (create)
|
||
|
|
|
||
|
|
**Estimated effort**: 2-3 hours
|
||
|
|
|
||
|
|
### Phase 2: Recovery Path Hardening (Next session)
|
||
|
|
|
||
|
|
**Goal**: Ensure crash recovery and forensics work under degradation
|
||
|
|
|
||
|
|
1. **Recovery.js coverage:**
|
||
|
|
- Test recovery with corrupted state files
|
||
|
|
- Test forensics collection under stress
|
||
|
|
- Test cleanup operations (branch/snapshot removal)
|
||
|
|
- Test partial recovery (recovery fails halfway)
|
||
|
|
|
||
|
|
2. **Crash log analysis:**
|
||
|
|
- Test crash pattern detection
|
||
|
|
- Test recommendation generation
|
||
|
|
- Test multi-instance crash correlation
|
||
|
|
|
||
|
|
**Estimated effort**: 2-3 hours
|
||
|
|
|
||
|
|
### Phase 3: State Machine & Property-Based Testing (Next session)
|
||
|
|
|
||
|
|
**Goal**: Guarantee FSM correctness under arbitrary conditions
|
||
|
|
|
||
|
|
1. **Phases.js hardening:**
|
||
|
|
- Add property-based tests with fast-check
|
||
|
|
- Generate arbitrary state transitions
|
||
|
|
- Verify no invalid state combinations
|
||
|
|
- Test timeout and failure injection
|
||
|
|
|
||
|
|
2. **Auto dispatch FSM:**
|
||
|
|
- Generate random unit sequences
|
||
|
|
- Verify dispatch always reaches terminal state
|
||
|
|
- Test concurrent dispatch (parallel workers)
|
||
|
|
- Verify cleanup on failure
|
||
|
|
|
||
|
|
**Estimated effort**: 2-3 hours
|
||
|
|
|
||
|
|
## Testing Approach
|
||
|
|
|
||
|
|
### Unit Tests (Primary)
|
||
|
|
|
||
|
|
- Test individual functions in isolation
|
||
|
|
- Mock external dependencies (filesystem, APIs)
|
||
|
|
- Focus on behavior contracts (what happens, not how)
|
||
|
|
- Name format: `<what>_<when>_<expected>`
|
||
|
|
|
||
|
|
Example:
|
||
|
|
```typescript
|
||
|
|
it('recordUnitOutcome_when_model_learner_fails_continues_dispatch', () => {
|
||
|
|
// Fire-and-forget: metric recording failure must not block
|
||
|
|
const fakeOutcome = { ...unitOutcome, token_count: NaN };
|
||
|
|
expect(() => metrics.recordUnitOutcome(fakeOutcome))
|
||
|
|
.not.toThrow();
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
### Integration Tests (Secondary)
|
||
|
|
|
||
|
|
- Test cross-module interactions
|
||
|
|
- Use real filesystem (temp directories)
|
||
|
|
- Verify async behavior and race conditions
|
||
|
|
- Focus on degradation paths
|
||
|
|
|
||
|
|
Example:
|
||
|
|
```typescript
|
||
|
|
it('dispatch_when_metrics_storage_unavailable_still_completes_unit', async () => {
|
||
|
|
// Scenario: .sf directory not writable
|
||
|
|
const unit = await dispatch({ ... });
|
||
|
|
expect(unit.status).toBe('done'); // Succeeds despite metrics failure
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
### Property-Based Tests (Tertiary)
|
||
|
|
|
||
|
|
- Use fast-check for FSM testing
|
||
|
|
- Generate arbitrary input sequences
|
||
|
|
- Verify invariants (e.g., "always terminate")
|
||
|
|
- Catch edge cases humans miss
|
||
|
|
|
||
|
|
Example:
|
||
|
|
```typescript
|
||
|
|
it('dispatch_maintains_invariant_always_reaches_terminal_state', () => {
|
||
|
|
fc.assert(
|
||
|
|
fc.property(fc.array(arbitraryUnits()), (units) => {
|
||
|
|
const results = units.map(u => dispatch(u));
|
||
|
|
return results.every(r => [DONE, FAILED, BLOCKED].includes(r.status));
|
||
|
|
})
|
||
|
|
);
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
## Success Criteria
|
||
|
|
|
||
|
|
✅ **Phase 1 complete** when:
|
||
|
|
- metrics.test.ts and triage-self-feedback.test.ts created
|
||
|
|
- Both files ≥ 20 tests each
|
||
|
|
- Coverage for metrics.js ≥ 60%
|
||
|
|
- Coverage for triage.js ≥ 55%
|
||
|
|
- All tests passing
|
||
|
|
- Fire-and-forget behavior verified
|
||
|
|
|
||
|
|
✅ **Phase 2 complete** when:
|
||
|
|
- recovery.test.ts created with ≥ 25 tests
|
||
|
|
- Crash recovery verified with corrupted state
|
||
|
|
- Forensics tested under filesystem failure
|
||
|
|
- Cleanup operations tested atomically
|
||
|
|
|
||
|
|
✅ **Phase 3 complete** when:
|
||
|
|
- Property-based tests added to phases.test.ts
|
||
|
|
- ≥ 100 property-based test cases
|
||
|
|
- Fast-check shrinking validates edge cases
|
||
|
|
- FSM invariants proven
|
||
|
|
|
||
|
|
## Files to Create/Modify
|
||
|
|
|
||
|
|
```
|
||
|
|
New files:
|
||
|
|
src/resources/extensions/sf/metrics.test.ts (25 tests, 60% coverage target)
|
||
|
|
src/resources/extensions/sf/triage-self-feedback.test.ts (20 tests, 55% coverage target)
|
||
|
|
src/resources/extensions/sf/recovery/recovery.test.ts (25 tests, 65% coverage target)
|
||
|
|
src/resources/extensions/sf/auto/phases.test.mjs (property-based tests)
|
||
|
|
|
||
|
|
Modified files:
|
||
|
|
vitest.config.ts (update thresholds: 50% global, 70% critical)
|
||
|
|
.github/workflows/ci.yml (enforce coverage in CI)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Risk Mitigation
|
||
|
|
|
||
|
|
**Risk**: Coverage tests too slow (current 5-10 min)
|
||
|
|
- **Mitigation**: Run coverage only in CI, not locally. Use `--no-coverage` for dev.
|
||
|
|
|
||
|
|
**Risk**: Fire-and-forget tests flaky (timing-dependent)
|
||
|
|
- **Mitigation**: Use explicit promises instead of setTimeout. Mock timers with Vitest.
|
||
|
|
|
||
|
|
**Risk**: Property-based tests generate too many cases
|
||
|
|
- **Mitigation**: Use fast-check with seed and shrink limit. Start with 100 cases, increase.
|
||
|
|
|
||
|
|
## Timeline
|
||
|
|
|
||
|
|
- **Today**: Phase 1 (metrics & triage hardening)
|
||
|
|
- **Next session**: Phase 2 (recovery paths)
|
||
|
|
- **Week after**: Phase 3 (property-based FSM tests)
|
||
|
|
- **Final**: CI gating on 60% thresholds for critical paths
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- Current coverage config: `vitest.config.ts` lines 52-80
|
||
|
|
- Quick wins implementation: `QUICK_WINS_INTEGRATION.md`
|
||
|
|
- Fire-and-forget pattern: `model-learner.js`, `self-report-fixer.js`
|
||
|
|
- FSM implementation: `src/resources/extensions/sf/auto/phases.js`
|