singularity-forge/docs/records/2026-05-07-metrics-central-fixes-applied.md

163 lines
4.9 KiB
Markdown

# Metrics-Central.js Fixes Applied
**Date**: 2026-05-07
**Scope**: Address 4 gaps identified in RA.Aid comparison review
---
## Fixes Applied
### 1. ✅ Session Scoping
**Problem**: Metrics were global to the process. No session filtering.
**Fix**:
- Added `_sessionId` module-level variable
- `initMetricsCentral(basePath, { sessionId, dbAdapter })` accepts session ID
- `recordCounter()` and `recordGauge()` auto-inject `session_id` label if not present
- `queryMetrics(db, sessionId, name, limit)` for DB queries filtered by session
**Test**: `session_id_auto_injected` — verifies session_id appears in Prometheus output
---
### 2. ✅ Cost/Token Metrics
**Problem**: No cost/token tracking in metrics-central. RA.Aid tracks per-trajectory.
**Fix**:
- Added `recordCost(unitId, modelId, inputTokens, outputTokens, cost, workMode)` function
- New metrics in METRIC_META:
- `sf_cost_total` — cumulative cost per unit/model/mode
- `sf_tokens_input_total` — input tokens per model
- `sf_tokens_output_total` — output tokens per model
- `sf_cost_last` — gauge for last recorded cost
**Test**: `cost_metrics_tracked` — verifies all 4 cost metrics are emitted
---
### 3. ✅ DB Persistence
**Problem**: `isDbAvailable` imported but unused. No SQLite persistence.
**Fix**:
- `initMetricsCentral(basePath, { dbAdapter })` accepts DB adapter
- `ensureMetricsTable(db)` creates `metrics` table with indexes
- `persistMetricsToDb(registry, sessionId, db)` flushes counters/gauges/histograms to DB
- `flushMetrics()` now writes to both Prometheus file AND SQLite
- `queryMetrics(db, sessionId, name, limit)` for programmatic queries
**Schema**:
```sql
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
labels TEXT, -- JSON object
value REAL NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
session_id TEXT
);
CREATE INDEX idx_metrics_name ON metrics(name);
CREATE INDEX idx_metrics_session ON metrics(session_id);
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
```
**Test**: `queryMetrics_returns_empty_without_db` — graceful fallback when no DB
---
### 4. ✅ Retry on Flush Failure
**Problem**: `flushMetrics()` caught and logged with `logWarning()`. No retry.
**Fix**:
- `FLUSH_RETRY_MAX = 3` attempts
- `FLUSH_RETRY_BASE_MS = 1000` with exponential backoff (1s, 2s, 4s)
- `_flushFailures` counter tracks consecutive failures
- After max retries, emits `sf_metrics_flush_failed_total` counter
- `stopMetricsCentral()` attempts final synchronous flush
**Behavior**:
```
Flush fail #1 → retry in 1s
Flush fail #2 → retry in 2s
Flush fail #3 → retry in 4s
Flush fail #4 → emit sf_metrics_flush_failed_total, give up
```
---
## Bonus Fixes (Not in Original 4)
### 5. ✅ Label Value Escaping
**Problem**: `=` or `,` in label values broke key parsing.
**Fix**:
- `_escapeLabel(v)` escapes `\``\\`, `=``\=`, `,``\,`
- `_parseLabelKey(key)` uses state machine parser instead of `split(',')`
- Labels sorted alphabetically for stable output
**Test**: `label_escaping_handles_special_chars``{ key: "a=b,c" }` round-trips correctly
### 6. ✅ Metric Name Validation
**Problem**: Invalid Prometheus names (spaces, leading numbers) passed through.
**Fix**:
- `validateMetricName(name)` enforces `^[a-zA-Z_:][a-zA-Z0-9_:]*$`
- Throws `TypeError` for non-strings, `Error` for invalid patterns
**Test**: `invalid_metric_name_rejected` — spaces and leading numbers rejected
---
## Test Results
```
Test Files 1 passed (1)
Tests 10 passed (10)
```
Full suite: 1029 passed, 5 pre-existing failures (unrelated worktree/staging tests), 1 skipped.
---
## Remaining Gaps vs RA.Aid
| Gap | Status | Notes |
|-----|--------|-------|
| Per-trajectory granularity | ❌ Still gap | Metrics are aggregated; individual events go to audit/trajectory |
| Cost CLI commands | ❌ Still gap | No `sf cost --session` or `sf cost --all` commands yet |
| Repository pattern | ❌ Still gap | Data access is functional, not class-based |
| Pydantic models | ❌ Still gap | No typed model layer |
| Expert model consultation | ❌ Still gap | No reasoning_assist equivalent |
| Token limiter | ❌ Still gap | No context window management |
| Model fallback on 429 | ✅ Already had | SF already switches models on rate-limit |
---
## API Summary
```javascript
// Initialize
initMetricsCentral("/project", {
sessionId: "sess-123",
dbAdapter: db,
flushIntervalMs: 60_000
});
// Record metrics
recordCounter("sf_gate_runs_total", { gate_id: "verify", outcome: "pass" });
recordGauge("sf_cost_guard_hourly_spend", 1.23);
recordHistogram("sf_gate_latency_ms", 150);
recordCost("unit-42", "claude-sonnet-4", 1500, 800, 0.045, "build");
// Query
const rows = queryMetrics(db, "sess-123", "sf_cost_total", 100);
// Shutdown
stopMetricsCentral(); // final flush + cleanup
```