Comprehensive guide for migrating from JSON to node:sqlite when Node 24 is available: - Schema design (model_outcomes + model_stats tables) - Phase-by-phase refactoring approach - Data migration from JSON with backward compatibility - Testing strategy with new SQLite-specific tests - Future opportunities: dashboards, trend analysis, A/B testing, federated learning This doc serves as a roadmap for ~2 days of work when Node 24 becomes standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9.2 KiB
SQLite Migration Guide for Model Learning
Status: Planned for Node 24.15.0 upgrade
Current: JSON-based storage (model-learner.js, self-report-fixer.js)
Target: Native node:sqlite integration
Why SQLite?
- Zero dependencies: Node 24+ has built-in
node:sqlite(no package install) - Queryable: SQL joins with UOK's
llm_task_outcomestable for unified learning database - Transactional: Atomic outcome recording prevents partial state corruption
- Performant: Indexes on (task_type, model_id) for per-task-type ranking queries
- Durable: WAL mode ensures data survives crashes
Current State (Node 20)
JSON-Based Storage
model-learner.js:.sf/model-performance.json(nested object hierarchy){ "execute-task": { "gpt-4o": { "successes": 42, "failures": 3, "successRate": 0.93 } } }self-report-fixer.js: Stateless (no persistent storage)triage-self-feedback.js: Reads/writesREQUIREMENTS.md,ARCHITECTURE.md
Pain Points
- Entire file read/write on every outcome (O(n) latency)
- No queryable schema (must load all data, filter in-memory)
- No transactions (partial failures possible)
- No natural joins with UOK database
SQLite Schema (Target)
Table 1: model_outcomes
Raw event log for every model outcome.
CREATE TABLE model_outcomes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL, -- "execute-task", "plan-slice", etc.
model_id TEXT NOT NULL, -- "gpt-4o", "claude-opus", etc.
success INTEGER NOT NULL, -- 1 = success, 0 = failure
timeout INTEGER NOT NULL DEFAULT 0, -- 1 = timed out, 0 = normal
tokens_used INTEGER NOT NULL DEFAULT 0,
cost_usd REAL NOT NULL DEFAULT 0.0,
timestamp TEXT NOT NULL, -- ISO 8601
FOREIGN KEY (task_type, model_id) REFERENCES model_stats(task_type, model_id)
);
CREATE INDEX idx_outcomes_task_model ON model_outcomes(task_type, model_id);
CREATE INDEX idx_outcomes_timestamp ON model_outcomes(timestamp DESC);
Table 2: model_stats
Aggregated per-task-per-model statistics (updated atomically with each outcome).
CREATE TABLE model_stats (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL,
model_id TEXT NOT NULL,
successes INTEGER NOT NULL DEFAULT 0,
failures INTEGER NOT NULL DEFAULT 0,
timeouts INTEGER NOT NULL DEFAULT 0,
total_tokens INTEGER NOT NULL DEFAULT 0,
total_cost REAL NOT NULL DEFAULT 0.0,
last_used TEXT, -- ISO 8601 timestamp of last outcome
UNIQUE(task_type, model_id)
);
CREATE INDEX idx_stats_task_model ON model_stats(task_type, model_id);
Migration Steps
Phase 1: Refactor ModelPerformanceTracker (model-learner.js)
Before (JSON):
recordOutcome(taskType, modelId, outcome) {
if (!this.data[taskType]) this.data[taskType] = {};
if (!this.data[taskType][modelId]) {
this.data[taskType][modelId] = { successes: 0, failures: 0, ... };
}
const stats = this.data[taskType][modelId];
if (outcome.success) stats.successes += 1;
else stats.failures += 1;
this._save(); // Entire file rewrite
}
After (SQLite):
recordOutcome(taskType, modelId, outcome) {
this.db.exec("BEGIN");
// Insert event
const insertStmt = this.db.prepare(`
INSERT INTO model_outcomes (task_type, model_id, success, timeout, ...)
VALUES (?, ?, ?, ?, ...)
`);
insertStmt.run(taskType, modelId, outcome.success ? 1 : 0, ...);
// Upsert stats
const updateStmt = this.db.prepare(`
INSERT INTO model_stats (task_type, model_id, successes, ...)
VALUES (?, ?, ?, ...)
ON CONFLICT(task_type, model_id) DO UPDATE SET
successes = successes + ?,
failures = failures + ?,
...
`);
updateStmt.run(...);
this.db.exec("COMMIT");
}
Benefits:
- O(1) outcome recording (single INSERT)
- Atomic transaction (both tables updated together)
- No full-file rewrite
Phase 2: Update Query Methods
getRankedModels → SQL SELECT with ORDER BY
getRankedModels(taskType, minSamples = 3) {
const query = this.db.prepare(`
SELECT model_id, successes, failures, total_tokens, total_cost, last_used
FROM model_stats
WHERE task_type = ? AND (successes + failures) >= ?
ORDER BY (CAST(successes AS FLOAT) / (successes + failures)) DESC
`);
return query.all(taskType, minSamples).map(row => ({
modelId: row.model_id,
successRate: row.successes / (row.successes + row.failures),
...
}));
}
Phase 3: Integrate with UOK Database (Optional)
If UOK stores outcomes in its database, consider a federated schema:
- Keep model_learner SQLite database separate (
.sf/model-performance.db) - OR: Create view in UOK database that joins with UOK's
llm_task_outcomes
-- In UOK database:
CREATE VIEW model_performance AS
SELECT
outcome.task_type,
outcome.model_id,
COUNT(CASE WHEN outcome.success = 1 THEN 1 END) as successes,
COUNT(CASE WHEN outcome.success = 0 THEN 1 END) as failures,
SUM(outcome.tokens_used) as total_tokens,
SUM(outcome.cost_usd) as total_cost
FROM llm_task_outcomes outcome
GROUP BY outcome.task_type, outcome.model_id;
Phase 4: Data Migration (JSON → SQLite)
Create migration function in constructor:
_initDb() {
const db = new DatabaseSync(this.dbPath);
// ... create tables ...
// Migrate existing JSON data
if (existsSync(this.oldJsonPath)) {
const jsonData = JSON.parse(readFileSync(this.oldJsonPath, 'utf-8'));
this._migrateFromJson(db, jsonData);
// After migration: delete old JSON or archive
}
return db;
}
_migrateFromJson(db, jsonData) {
db.exec("BEGIN");
for (const [taskType, models] of Object.entries(jsonData)) {
for (const [modelId, stats] of Object.entries(models)) {
const insertStmt = db.prepare(`
INSERT INTO model_stats
(task_type, model_id, successes, failures, timeouts, total_tokens, total_cost, last_used)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`);
insertStmt.run(
taskType, modelId,
stats.successes, stats.failures, stats.timeouts || 0,
stats.totalTokens, stats.totalCost, stats.lastUsed
);
}
}
db.exec("COMMIT");
}
Testing Strategy
Unit Tests (No Changes Needed)
Existing tests in model-learner.test.ts should pass unchanged:
recordOutcome()API remains the samegetRankedModels()returns same shapeshouldDemote(),getABTestCandidates()unchanged
Integration Tests (Add SQLite-Specific)
test("persists to SQLite database", () => {
const learner = new ModelLearner(basePath);
learner.recordOutcome("execute-task", "gpt-4o", { success: true, tokensUsed: 100 });
// Verify record in model_outcomes table
const query = learner.tracker.db.prepare(`
SELECT COUNT(*) as count FROM model_outcomes
WHERE task_type = ? AND model_id = ?
`);
const result = query.get("execute-task", "gpt-4o");
expect(result.count).toBe(1);
});
test("transactions are atomic", () => {
// Simulate failure during upsert
// Verify both INSERT and UPDATE succeed or both rollback
});
Timeline
-
When Node 24.15.0 becomes standard (6-8 weeks)
- Update
.nvmrc,package.jsonengines - Enable snap to run Node 24
- Update
-
Migration PR (2 days of work)
- Refactor
ModelPerformanceTrackerclass - Add migration function
- Test with existing unit tests
- Refactor
-
Rollout (1 day)
- Deploy with backward-compatible JSON→SQLite auto-migration
- Monitor for edge cases
- Archive old JSON files after 1 week
Backward Compatibility
- Auto-migrate: On first run with Node 24, detect
.sf/model-performance.jsonand import to SQLite - Keep JSON: Don't delete old JSON file immediately (keep for 1 week as backup)
- Graceful fallback: If SQLite init fails, log error and fall back to JSON (degraded mode)
Future Opportunities
Once SQLite is in place:
-
Dashboard: Query performance metrics
SELECT model_id, ROUND(100.0 * successes / (successes + failures), 1) as success_rate, total_tokens, total_cost FROM model_stats WHERE task_type = ? ORDER BY success_rate DESC; -
Trend analysis: Model performance over time
SELECT DATE(timestamp) as day, model_id, COUNT(*) as attempts, SUM(success) as wins, ROUND(100.0 * SUM(success) / COUNT(*), 1) as daily_success_rate FROM model_outcomes WHERE task_type = ? AND timestamp > date('now', '-30 days') GROUP BY day, model_id ORDER BY day DESC; -
A/B testing: Compare challenger vs incumbent in detail
SELECT model_id, COUNT(*) as trials, SUM(success) as wins, ROUND(AVG(tokens_used), 0) as avg_tokens, ROUND(AVG(cost_usd), 4) as avg_cost FROM model_outcomes WHERE task_type = ? AND timestamp > ? GROUP BY model_id; -
Federated learning: Export performance data for cross-project analysis
SELECT * FROM model_stats WHERE successes + failures >= 10 -- High-confidence entries only ORDER BY success_rate DESC;
References
- Node.js
node:sqlitedocs: https://nodejs.org/api/sqlite.html - UOK
llm_task_outcomesschema: Seedocs/dev/UOK-SELF-EVOLUTION.md - SQLite WAL mode: https://www.sqlite.org/wal.html