# Metrics Central vs RA.Aid Architecture Review **Date**: 2026-05-07 **Reviewer**: Claude Code (SF) **Scope**: `metrics-central.js` and its wiring, compared against RA.Aid patterns --- ## RA.Aid Architecture Summary RA.Aid is a Python-based autonomous coding agent with these key architectural decisions: | Layer | Pattern | |-------|---------| | **State** | Peewee ORM over SQLite (`.ra-aid/pk.db`), WAL mode, contextvars for connection scoping | | **Agents** | LangGraph agents (research → planning → implementation) with explicit stage boundaries | | **Memory** | Key facts, key snippets, research notes, trajectories — all DB-backed with repositories | | **Trajectory** | Every tool call recorded: tool_name, parameters, result, cost, tokens, is_error, error_message | | **Config** | JSON config file + runtime config repository with defaults | | **Shell** | Interactive approval with cowboy_mode bypass, trajectory logging, timeout handling | | **Reasoning** | Optional expert model consultation before each stage (reasoning_assist) | | **Recovery** | Fallback handlers, retry with backoff, agent thread manager | ### RA.Aid's Observability Model RA.Aid doesn't have a separate metrics system. Instead, observability is **embedded in the trajectory**: - Every tool execution → `Trajectory` record with cost, tokens, timing - Every stage transition → `Trajectory` record with `record_type="stage_transition"` - Every human input → `HumanInput` record linked to trajectories - Every error → `Trajectory` with `is_error=true`, `error_type`, `error_details` This is **event-sourced observability**: the DB is the single source of truth for both state AND metrics. --- ## Our Metrics-Central.js Design ### What We Built A Prometheus-compatible metrics collector with: - Counter, Gauge, Histogram types - In-memory aggregation with 60s flush to `.sf/runtime/sf-metrics.prom` - Pre-defined metric metadata registry - Wiring into subagent inheritance and mode transitions ### Design Decisions and Their Trade-offs | Decision | Rationale | RA.Aid Comparison | |----------|-----------|-------------------| | **Prometheus text format** | Compatible with existing exposition, scrapeable by Grafana | RA.Aid uses DB queries; we support both | | **In-memory aggregation** | Zero dependencies, fast | RA.Aid queries DB directly; we add a layer | | **60s flush interval** | Batch writes, reduce I/O | RA.Aid writes per event; we batch | | **Separate from trajectory/audit** | Metrics are aggregated views, not individual events | RA.Aid conflates events and metrics | | **Metric metadata registry** | Pre-defined help text and labels | RA.Aid uses Peewee model definitions | --- ## The Review: 5 Lenses ### Lens 1: Data Model Consistency **RA.Aid Pattern**: Single SQLite DB with typed models. Trajectory is the universal event log. **Our Pattern**: Dual persistence: - SQLite for operational state (UOK, sessions, tasks) - Prometheus text file for metrics exposition - JSONL for event durability **Verdict**: ⚠️ **NEEDS WORK** We have THREE observability sinks (SQLite, Prometheus file, JSONL) where RA.Aid has one. This creates: - Risk of inconsistency between `sf-metrics.prom` and `sf.db` - No unified query surface for "show me all subagent blocks in the last hour" - Metrics file is write-only; no read path for programmatic consumption **Recommendation**: Add a `metrics` table to `sf.db` that mirrors the Prometheus data model. The text file becomes a **projection**, not a source of truth. ```sql CREATE TABLE metrics ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')), labels TEXT, -- JSON object value REAL NOT NULL, timestamp TEXT NOT NULL DEFAULT (datetime('now')), session_id TEXT ); ``` ### Lens 2: Event-Sourced vs Aggregated **RA.Aid Pattern**: Every event is a row. Aggregation happens at query time. **Our Pattern**: Aggregation happens at write time. Individual events are lost. **Verdict**: ✅ **ACCEPTABLE for metrics, but incomplete for observability** For counters and gauges, aggregation is correct. But for debugging "why was this subagent blocked?", we need the individual event, not just `sf_subagent_dispatch_blocked{reason="provider"} 5`. **Recommendation**: Keep metrics-central for aggregated Prometheus output, but ALSO emit individual events to the audit/trajectory system. The metric is the summary; the trajectory is the detail. ### Lens 3: Context and Session Scoping **RA.Aid Pattern**: Every record has a `session_id` foreign key. Contextvars scope the DB connection. **Our Pattern**: Metrics are global to the process. No session scoping. **Verdict**: ❌ **GAP** Our metrics can't answer: "How many subagent dispatches were blocked in session X?" This is critical for: - Per-session cost attribution - Debugging why a specific run failed - Multi-tenant scenarios (if SF ever serves multiple users) **Recommendation**: Add `session_id` label to all metrics. Use `ctx.sessionId` or `getAutoSession().currentTraceId`. ### Lens 4: Cost and Token Tracking **RA.Aid Pattern**: Every trajectory record has `current_cost`, `input_tokens`, `output_tokens`. **Our Pattern**: No cost/token metrics in metrics-central yet. **Verdict**: ❌ **MISSING** RA.Aid tracks cost per tool call. We track cost in `metrics.js` (SQLite + JSONL) but not in metrics-central. This means: - No Prometheus-compatible cost metrics - No cost alerts from Grafana - No cost attribution by work mode or permission profile **Recommendation**: Add cost/token metrics: ```javascript "sf_cost_total": { help: "Total cost in USD", labels: ["work_mode", "model_id"] }, "sf_tokens_input_total": { help: "Total input tokens", labels: ["model_id"] }, "sf_tokens_output_total": { help: "Total output tokens", labels: ["model_id"] }, ``` ### Lens 5: Error Handling and Resilience **RA.Aid Pattern**: Every error is caught, logged, and stored in the trajectory with full context. **Our Pattern**: `flushMetrics()` catches and logs with `logWarning()`. No retry. **Verdict**: ⚠️ **ACCEPTABLE but could be stronger** Our flush failure is best-effort, which matches RA.Aid's philosophy. But RA.Aid also: - Reopens closed DB connections automatically - Has fallback handlers for agent failures - Records error details in the trajectory **Recommendation**: 1. Add retry with exponential backoff for flush failures 2. If flush fails 3 times, emit a `metrics_flush_failed` counter 3. On process exit, attempt a final synchronous flush --- ## Specific Code Review Findings ### Finding 1: Unused Import ```javascript import { isDbAvailable } from "./sf-db.js"; ``` This is imported but never used. The JSDoc mentions "Optional SQLite persistence" but it's not implemented. **Fix**: Either implement DB persistence or remove the import. ### Finding 2: Histogram Bucket Sorting ```javascript this.buckets = [...buckets].sort((a, b) => a - b); ``` This mutates the input array (creates a copy first, so safe). But Prometheus expects buckets in ascending order, which is guaranteed. **Verdict**: ✅ Correct. ### Finding 3: Label Key Serialization ```javascript _key(labels) { return this.labelNames.map((k) => `${k}=${labels[k] ?? ""}`).join(","); } ``` If a label value contains `=` or `,`, the key parsing will break. **Fix**: Add escaping or use a structured key format (e.g., JSON). ### Finding 4: No Validation on Metric Names ```javascript export function recordCounter(name, labels = {}, amount = 1) { const meta = getMetricMeta(name); getRegistry().counter(name, meta.help, Object.keys(labels)).inc(labels, amount); } ``` If `name` contains spaces or invalid Prometheus characters, the output will be malformed. **Fix**: Add `validateMetricName(name)` that rejects invalid characters. ### Finding 5: Timer Unref ```javascript if (_flushTimer.unref) _flushTimer.unref(); ``` This is correct for Node.js but may not work in all environments (e.g., Bun). **Verdict**: ✅ Acceptable with fallback. --- ## Overall Assessment | Dimension | Grade | Notes | |-----------|-------|-------| | **Correctness** | B+ | Prometheus output is valid, but label escaping needs work | | **Completeness** | B | Missing cost/token metrics, session scoping, DB persistence | | **Consistency with SF** | A | Fits the extension model, uses existing patterns | | **Consistency with RA.Aid** | C | RA.Aid would prefer event-sourced over aggregated | | **Production Readiness** | B | Needs retry, validation, and DB projection before GA | ### Priority Fixes 1. **P0**: Add `session_id` label to all metrics 2. **P0**: Remove unused `isDbAvailable` import or implement DB persistence 3. **P1**: Add cost/token metrics 4. **P1**: Fix label value escaping 5. **P1**: Add metric name validation 6. **P2**: Add retry with backoff for flush failures 7. **P2**: Add final flush on process exit 8. **P2**: Consider a `metrics` table in `sf.db` as source of truth ### RA.Aid Patterns Worth Adopting 1. **Trajectory-style event logging**: Every metric should have a corresponding event in the audit/trajectory system 2. **Session-scoped connections**: All observability should be filterable by session 3. **Per-tool cost tracking**: Every tool call should record cost and tokens 4. **Error detail preservation**: When metrics indicate failure, the detail should be queryable --- ## Conclusion `metrics-central.js` is a solid Prometheus-compatible metrics layer that fills a real gap in SF's observability. However, it prioritizes **exposition format** over **observability depth**. RA.Aid's trajectory model is superior for debugging and audit because it preserves every event. The right path forward: 1. Keep metrics-central for Prometheus output (Grafana compatibility) 2. Add a `metrics` table to `sf.db` for queryable aggregation 3. Ensure every metric has a corresponding audit/trajectory event 4. Add session scoping and cost tracking This gives us the best of both worlds: Prometheus for dashboards, SQLite for queries, and trajectory for debugging.