singularity-forge/docs/records/2026-05-07-metrics-central-fixes-applied.md

# Metrics-Central.js Fixes Applied

**Date**: 2026-05-07
**Scope**: Address 4 gaps identified in RA.Aid comparison review

---

## Fixes Applied

### 1. ✅ Session Scoping

**Problem**: Metrics were global to the process. No session filtering.

**Fix**:
- Added `_sessionId` module-level variable
- `initMetricsCentral(basePath, { sessionId, dbAdapter })` accepts session ID
- `recordCounter()` and `recordGauge()` auto-inject `session_id` label if not present
- `queryMetrics(db, sessionId, name, limit)` for DB queries filtered by session

**Test**: `session_id_auto_injected` — verifies session_id appears in Prometheus output

---

### 2. ✅ Cost/Token Metrics

**Problem**: No cost/token tracking in metrics-central. RA.Aid tracks per-trajectory.

**Fix**:
- Added `recordCost(unitId, modelId, inputTokens, outputTokens, cost, workMode)` function
- New metrics in METRIC_META:
  - `sf_cost_total` — cumulative cost per unit/model/mode
  - `sf_tokens_input_total` — input tokens per model
  - `sf_tokens_output_total` — output tokens per model
  - `sf_cost_last` — gauge for last recorded cost

**Test**: `cost_metrics_tracked` — verifies all 4 cost metrics are emitted

---

### 3. ✅ DB Persistence

**Problem**: `isDbAvailable` imported but unused. No SQLite persistence.

**Fix**:
- `initMetricsCentral(basePath, { dbAdapter })` accepts DB adapter
- `ensureMetricsTable(db)` creates `metrics` table with indexes
- `persistMetricsToDb(registry, sessionId, db)` flushes counters/gauges/histograms to DB
- `flushMetrics()` now writes to both Prometheus file AND SQLite
- `queryMetrics(db, sessionId, name, limit)` for programmatic queries

**Schema**:
```sql
CREATE TABLE metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
    labels TEXT,           -- JSON object
    value REAL NOT NULL,
    timestamp TEXT NOT NULL DEFAULT (datetime('now')),
    session_id TEXT
);
CREATE INDEX idx_metrics_name ON metrics(name);
CREATE INDEX idx_metrics_session ON metrics(session_id);
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
```

**Test**: `queryMetrics_returns_empty_without_db` — graceful fallback when no DB

---

### 4. ✅ Retry on Flush Failure

**Problem**: `flushMetrics()` caught and logged with `logWarning()`. No retry.

**Fix**:
- `FLUSH_RETRY_MAX = 3` attempts
- `FLUSH_RETRY_BASE_MS = 1000` with exponential backoff (1s, 2s, 4s)
- `_flushFailures` counter tracks consecutive failures
- After max retries, emits `sf_metrics_flush_failed_total` counter
- `stopMetricsCentral()` attempts final synchronous flush

**Behavior**:
```
Flush fail #1 → retry in 1s
Flush fail #2 → retry in 2s
Flush fail #3 → retry in 4s
Flush fail #4 → emit sf_metrics_flush_failed_total, give up
```

---

## Bonus Fixes (Not in Original 4)

### 5. ✅ Label Value Escaping

**Problem**: `=` or `,` in label values broke key parsing.

**Fix**:
- `_escapeLabel(v)` escapes `\` → `\\`, `=` → `\=`, `,` → `\,`
- `_parseLabelKey(key)` uses state machine parser instead of `split(',')`
- Labels sorted alphabetically for stable output

**Test**: `label_escaping_handles_special_chars` — `{ key: "a=b,c" }` round-trips correctly

### 6. ✅ Metric Name Validation

**Problem**: Invalid Prometheus names (spaces, leading numbers) passed through.

**Fix**:
- `validateMetricName(name)` enforces `^[a-zA-Z_:][a-zA-Z0-9_:]*$`
- Throws `TypeError` for non-strings, `Error` for invalid patterns

**Test**: `invalid_metric_name_rejected` — spaces and leading numbers rejected

---

## Test Results

```
Test Files  1 passed (1)
Tests       10 passed (10)
```

Full suite: 1029 passed, 5 pre-existing failures (unrelated worktree/staging tests), 1 skipped.

---

## Remaining Gaps vs RA.Aid

| Gap | Status | Notes |
|-----|--------|-------|
| Per-trajectory granularity | ❌ Still gap | Metrics are aggregated; individual events go to audit/trajectory |
| Cost CLI commands | ❌ Still gap | No `sf cost --session` or `sf cost --all` commands yet |
| Repository pattern | ❌ Still gap | Data access is functional, not class-based |
| Pydantic models | ❌ Still gap | No typed model layer |
| Expert model consultation | ❌ Still gap | No reasoning_assist equivalent |
| Token limiter | ❌ Still gap | No context window management |
| Model fallback on 429 | ✅ Already had | SF already switches models on rate-limit |

---

## API Summary

```javascript
// Initialize
initMetricsCentral("/project", {
  sessionId: "sess-123",
  dbAdapter: db,
  flushIntervalMs: 60_000
});

// Record metrics
recordCounter("sf_gate_runs_total", { gate_id: "verify", outcome: "pass" });
recordGauge("sf_cost_guard_hourly_spend", 1.23);
recordHistogram("sf_gate_latency_ms", 150);
recordCost("unit-42", "claude-sonnet-4", 1500, 800, 0.045, "build");

// Query
const rows = queryMetrics(db, "sess-123", "sf_cost_total", 100);

// Shutdown
stopMetricsCentral(); // final flush + cleanup
```