163 lines
4.9 KiB
Markdown
163 lines
4.9 KiB
Markdown
# Metrics-Central.js Fixes Applied
|
|
|
|
**Date**: 2026-05-07
|
|
**Scope**: Address 4 gaps identified in RA.Aid comparison review
|
|
|
|
---
|
|
|
|
## Fixes Applied
|
|
|
|
### 1. ✅ Session Scoping
|
|
|
|
**Problem**: Metrics were global to the process. No session filtering.
|
|
|
|
**Fix**:
|
|
- Added `_sessionId` module-level variable
|
|
- `initMetricsCentral(basePath, { sessionId, dbAdapter })` accepts session ID
|
|
- `recordCounter()` and `recordGauge()` auto-inject `session_id` label if not present
|
|
- `queryMetrics(db, sessionId, name, limit)` for DB queries filtered by session
|
|
|
|
**Test**: `session_id_auto_injected` — verifies session_id appears in Prometheus output
|
|
|
|
---
|
|
|
|
### 2. ✅ Cost/Token Metrics
|
|
|
|
**Problem**: No cost/token tracking in metrics-central. RA.Aid tracks per-trajectory.
|
|
|
|
**Fix**:
|
|
- Added `recordCost(unitId, modelId, inputTokens, outputTokens, cost, workMode)` function
|
|
- New metrics in METRIC_META:
|
|
- `sf_cost_total` — cumulative cost per unit/model/mode
|
|
- `sf_tokens_input_total` — input tokens per model
|
|
- `sf_tokens_output_total` — output tokens per model
|
|
- `sf_cost_last` — gauge for last recorded cost
|
|
|
|
**Test**: `cost_metrics_tracked` — verifies all 4 cost metrics are emitted
|
|
|
|
---
|
|
|
|
### 3. ✅ DB Persistence
|
|
|
|
**Problem**: `isDbAvailable` imported but unused. No SQLite persistence.
|
|
|
|
**Fix**:
|
|
- `initMetricsCentral(basePath, { dbAdapter })` accepts DB adapter
|
|
- `ensureMetricsTable(db)` creates `metrics` table with indexes
|
|
- `persistMetricsToDb(registry, sessionId, db)` flushes counters/gauges/histograms to DB
|
|
- `flushMetrics()` now writes to both Prometheus file AND SQLite
|
|
- `queryMetrics(db, sessionId, name, limit)` for programmatic queries
|
|
|
|
**Schema**:
|
|
```sql
|
|
CREATE TABLE metrics (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
name TEXT NOT NULL,
|
|
type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
|
|
labels TEXT, -- JSON object
|
|
value REAL NOT NULL,
|
|
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
|
|
session_id TEXT
|
|
);
|
|
CREATE INDEX idx_metrics_name ON metrics(name);
|
|
CREATE INDEX idx_metrics_session ON metrics(session_id);
|
|
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
|
|
```
|
|
|
|
**Test**: `queryMetrics_returns_empty_without_db` — graceful fallback when no DB
|
|
|
|
---
|
|
|
|
### 4. ✅ Retry on Flush Failure
|
|
|
|
**Problem**: `flushMetrics()` caught and logged with `logWarning()`. No retry.
|
|
|
|
**Fix**:
|
|
- `FLUSH_RETRY_MAX = 3` attempts
|
|
- `FLUSH_RETRY_BASE_MS = 1000` with exponential backoff (1s, 2s, 4s)
|
|
- `_flushFailures` counter tracks consecutive failures
|
|
- After max retries, emits `sf_metrics_flush_failed_total` counter
|
|
- `stopMetricsCentral()` attempts final synchronous flush
|
|
|
|
**Behavior**:
|
|
```
|
|
Flush fail #1 → retry in 1s
|
|
Flush fail #2 → retry in 2s
|
|
Flush fail #3 → retry in 4s
|
|
Flush fail #4 → emit sf_metrics_flush_failed_total, give up
|
|
```
|
|
|
|
---
|
|
|
|
## Bonus Fixes (Not in Original 4)
|
|
|
|
### 5. ✅ Label Value Escaping
|
|
|
|
**Problem**: `=` or `,` in label values broke key parsing.
|
|
|
|
**Fix**:
|
|
- `_escapeLabel(v)` escapes `\` → `\\`, `=` → `\=`, `,` → `\,`
|
|
- `_parseLabelKey(key)` uses state machine parser instead of `split(',')`
|
|
- Labels sorted alphabetically for stable output
|
|
|
|
**Test**: `label_escaping_handles_special_chars` — `{ key: "a=b,c" }` round-trips correctly
|
|
|
|
### 6. ✅ Metric Name Validation
|
|
|
|
**Problem**: Invalid Prometheus names (spaces, leading numbers) passed through.
|
|
|
|
**Fix**:
|
|
- `validateMetricName(name)` enforces `^[a-zA-Z_:][a-zA-Z0-9_:]*$`
|
|
- Throws `TypeError` for non-strings, `Error` for invalid patterns
|
|
|
|
**Test**: `invalid_metric_name_rejected` — spaces and leading numbers rejected
|
|
|
|
---
|
|
|
|
## Test Results
|
|
|
|
```
|
|
Test Files 1 passed (1)
|
|
Tests 10 passed (10)
|
|
```
|
|
|
|
Full suite: 1029 passed, 5 pre-existing failures (unrelated worktree/staging tests), 1 skipped.
|
|
|
|
---
|
|
|
|
## Remaining Gaps vs RA.Aid
|
|
|
|
| Gap | Status | Notes |
|
|
|-----|--------|-------|
|
|
| Per-trajectory granularity | ❌ Still gap | Metrics are aggregated; individual events go to audit/trajectory |
|
|
| Cost CLI commands | ❌ Still gap | No `sf cost --session` or `sf cost --all` commands yet |
|
|
| Repository pattern | ❌ Still gap | Data access is functional, not class-based |
|
|
| Pydantic models | ❌ Still gap | No typed model layer |
|
|
| Expert model consultation | ❌ Still gap | No reasoning_assist equivalent |
|
|
| Token limiter | ❌ Still gap | No context window management |
|
|
| Model fallback on 429 | ✅ Already had | SF already switches models on rate-limit |
|
|
|
|
---
|
|
|
|
## API Summary
|
|
|
|
```javascript
|
|
// Initialize
|
|
initMetricsCentral("/project", {
|
|
sessionId: "sess-123",
|
|
dbAdapter: db,
|
|
flushIntervalMs: 60_000
|
|
});
|
|
|
|
// Record metrics
|
|
recordCounter("sf_gate_runs_total", { gate_id: "verify", outcome: "pass" });
|
|
recordGauge("sf_cost_guard_hourly_spend", 1.23);
|
|
recordHistogram("sf_gate_latency_ms", 150);
|
|
recordCost("unit-42", "claude-sonnet-4", 1500, 800, 0.045, "build");
|
|
|
|
// Query
|
|
const rows = queryMetrics(db, "sess-123", "sf_cost_total", 100);
|
|
|
|
// Shutdown
|
|
stopMetricsCentral(); // final flush + cleanup
|
|
```
|