singularity-forge/docs/dev/SQLITE-MIGRATION.md
Mikael Hugo f2db20b4d6 docs: add SQLite migration guide for Node 24 upgrade
Comprehensive guide for migrating from JSON to node:sqlite when Node 24 is available:
- Schema design (model_outcomes + model_stats tables)
- Phase-by-phase refactoring approach
- Data migration from JSON with backward compatibility
- Testing strategy with new SQLite-specific tests
- Future opportunities: dashboards, trend analysis, A/B testing, federated learning

This doc serves as a roadmap for ~2 days of work when Node 24 becomes standard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 23:03:50 +02:00

9.2 KiB

SQLite Migration Guide for Model Learning

Status: Planned for Node 24.15.0 upgrade Current: JSON-based storage (model-learner.js, self-report-fixer.js) Target: Native node:sqlite integration

Why SQLite?

  1. Zero dependencies: Node 24+ has built-in node:sqlite (no package install)
  2. Queryable: SQL joins with UOK's llm_task_outcomes table for unified learning database
  3. Transactional: Atomic outcome recording prevents partial state corruption
  4. Performant: Indexes on (task_type, model_id) for per-task-type ranking queries
  5. Durable: WAL mode ensures data survives crashes

Current State (Node 20)

JSON-Based Storage

  • model-learner.js: .sf/model-performance.json (nested object hierarchy)
    {
      "execute-task": {
        "gpt-4o": {
          "successes": 42,
          "failures": 3,
          "successRate": 0.93
        }
      }
    }
    
  • self-report-fixer.js: Stateless (no persistent storage)
  • triage-self-feedback.js: Reads/writes REQUIREMENTS.md, ARCHITECTURE.md

Pain Points

  • Entire file read/write on every outcome (O(n) latency)
  • No queryable schema (must load all data, filter in-memory)
  • No transactions (partial failures possible)
  • No natural joins with UOK database

SQLite Schema (Target)

Table 1: model_outcomes

Raw event log for every model outcome.

CREATE TABLE model_outcomes (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_type TEXT NOT NULL,           -- "execute-task", "plan-slice", etc.
  model_id TEXT NOT NULL,             -- "gpt-4o", "claude-opus", etc.
  success INTEGER NOT NULL,           -- 1 = success, 0 = failure
  timeout INTEGER NOT NULL DEFAULT 0, -- 1 = timed out, 0 = normal
  tokens_used INTEGER NOT NULL DEFAULT 0,
  cost_usd REAL NOT NULL DEFAULT 0.0,
  timestamp TEXT NOT NULL,            -- ISO 8601
  FOREIGN KEY (task_type, model_id) REFERENCES model_stats(task_type, model_id)
);

CREATE INDEX idx_outcomes_task_model ON model_outcomes(task_type, model_id);
CREATE INDEX idx_outcomes_timestamp ON model_outcomes(timestamp DESC);

Table 2: model_stats

Aggregated per-task-per-model statistics (updated atomically with each outcome).

CREATE TABLE model_stats (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_type TEXT NOT NULL,
  model_id TEXT NOT NULL,
  successes INTEGER NOT NULL DEFAULT 0,
  failures INTEGER NOT NULL DEFAULT 0,
  timeouts INTEGER NOT NULL DEFAULT 0,
  total_tokens INTEGER NOT NULL DEFAULT 0,
  total_cost REAL NOT NULL DEFAULT 0.0,
  last_used TEXT,                    -- ISO 8601 timestamp of last outcome
  UNIQUE(task_type, model_id)
);

CREATE INDEX idx_stats_task_model ON model_stats(task_type, model_id);

Migration Steps

Phase 1: Refactor ModelPerformanceTracker (model-learner.js)

Before (JSON):

recordOutcome(taskType, modelId, outcome) {
  if (!this.data[taskType]) this.data[taskType] = {};
  if (!this.data[taskType][modelId]) {
    this.data[taskType][modelId] = { successes: 0, failures: 0, ... };
  }
  const stats = this.data[taskType][modelId];
  if (outcome.success) stats.successes += 1;
  else stats.failures += 1;
  this._save(); // Entire file rewrite
}

After (SQLite):

recordOutcome(taskType, modelId, outcome) {
  this.db.exec("BEGIN");
  
  // Insert event
  const insertStmt = this.db.prepare(`
    INSERT INTO model_outcomes (task_type, model_id, success, timeout, ...)
    VALUES (?, ?, ?, ?, ...)
  `);
  insertStmt.run(taskType, modelId, outcome.success ? 1 : 0, ...);
  
  // Upsert stats
  const updateStmt = this.db.prepare(`
    INSERT INTO model_stats (task_type, model_id, successes, ...)
    VALUES (?, ?, ?, ...)
    ON CONFLICT(task_type, model_id) DO UPDATE SET
      successes = successes + ?,
      failures = failures + ?,
      ...
  `);
  updateStmt.run(...);
  
  this.db.exec("COMMIT");
}

Benefits:

  • O(1) outcome recording (single INSERT)
  • Atomic transaction (both tables updated together)
  • No full-file rewrite

Phase 2: Update Query Methods

getRankedModels → SQL SELECT with ORDER BY

getRankedModels(taskType, minSamples = 3) {
  const query = this.db.prepare(`
    SELECT model_id, successes, failures, total_tokens, total_cost, last_used
    FROM model_stats
    WHERE task_type = ? AND (successes + failures) >= ?
    ORDER BY (CAST(successes AS FLOAT) / (successes + failures)) DESC
  `);
  return query.all(taskType, minSamples).map(row => ({
    modelId: row.model_id,
    successRate: row.successes / (row.successes + row.failures),
    ...
  }));
}

Phase 3: Integrate with UOK Database (Optional)

If UOK stores outcomes in its database, consider a federated schema:

  • Keep model_learner SQLite database separate (.sf/model-performance.db)
  • OR: Create view in UOK database that joins with UOK's llm_task_outcomes
-- In UOK database:
CREATE VIEW model_performance AS
SELECT 
  outcome.task_type,
  outcome.model_id,
  COUNT(CASE WHEN outcome.success = 1 THEN 1 END) as successes,
  COUNT(CASE WHEN outcome.success = 0 THEN 1 END) as failures,
  SUM(outcome.tokens_used) as total_tokens,
  SUM(outcome.cost_usd) as total_cost
FROM llm_task_outcomes outcome
GROUP BY outcome.task_type, outcome.model_id;

Phase 4: Data Migration (JSON → SQLite)

Create migration function in constructor:

_initDb() {
  const db = new DatabaseSync(this.dbPath);
  // ... create tables ...
  
  // Migrate existing JSON data
  if (existsSync(this.oldJsonPath)) {
    const jsonData = JSON.parse(readFileSync(this.oldJsonPath, 'utf-8'));
    this._migrateFromJson(db, jsonData);
    // After migration: delete old JSON or archive
  }
  
  return db;
}

_migrateFromJson(db, jsonData) {
  db.exec("BEGIN");
  
  for (const [taskType, models] of Object.entries(jsonData)) {
    for (const [modelId, stats] of Object.entries(models)) {
      const insertStmt = db.prepare(`
        INSERT INTO model_stats 
        (task_type, model_id, successes, failures, timeouts, total_tokens, total_cost, last_used)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
      `);
      insertStmt.run(
        taskType, modelId,
        stats.successes, stats.failures, stats.timeouts || 0,
        stats.totalTokens, stats.totalCost, stats.lastUsed
      );
    }
  }
  
  db.exec("COMMIT");
}

Testing Strategy

Unit Tests (No Changes Needed)

Existing tests in model-learner.test.ts should pass unchanged:

  • recordOutcome() API remains the same
  • getRankedModels() returns same shape
  • shouldDemote(), getABTestCandidates() unchanged

Integration Tests (Add SQLite-Specific)

test("persists to SQLite database", () => {
  const learner = new ModelLearner(basePath);
  learner.recordOutcome("execute-task", "gpt-4o", { success: true, tokensUsed: 100 });
  
  // Verify record in model_outcomes table
  const query = learner.tracker.db.prepare(`
    SELECT COUNT(*) as count FROM model_outcomes 
    WHERE task_type = ? AND model_id = ?
  `);
  const result = query.get("execute-task", "gpt-4o");
  expect(result.count).toBe(1);
});

test("transactions are atomic", () => {
  // Simulate failure during upsert
  // Verify both INSERT and UPDATE succeed or both rollback
});

Timeline

  1. When Node 24.15.0 becomes standard (6-8 weeks)

    • Update .nvmrc, package.json engines
    • Enable snap to run Node 24
  2. Migration PR (2 days of work)

    • Refactor ModelPerformanceTracker class
    • Add migration function
    • Test with existing unit tests
  3. Rollout (1 day)

    • Deploy with backward-compatible JSON→SQLite auto-migration
    • Monitor for edge cases
    • Archive old JSON files after 1 week

Backward Compatibility

  • Auto-migrate: On first run with Node 24, detect .sf/model-performance.json and import to SQLite
  • Keep JSON: Don't delete old JSON file immediately (keep for 1 week as backup)
  • Graceful fallback: If SQLite init fails, log error and fall back to JSON (degraded mode)

Future Opportunities

Once SQLite is in place:

  1. Dashboard: Query performance metrics

    SELECT model_id, 
      ROUND(100.0 * successes / (successes + failures), 1) as success_rate,
      total_tokens, total_cost
    FROM model_stats
    WHERE task_type = ?
    ORDER BY success_rate DESC;
    
  2. Trend analysis: Model performance over time

    SELECT DATE(timestamp) as day, model_id, COUNT(*) as attempts,
      SUM(success) as wins, 
      ROUND(100.0 * SUM(success) / COUNT(*), 1) as daily_success_rate
    FROM model_outcomes
    WHERE task_type = ? AND timestamp > date('now', '-30 days')
    GROUP BY day, model_id
    ORDER BY day DESC;
    
  3. A/B testing: Compare challenger vs incumbent in detail

    SELECT 
      model_id,
      COUNT(*) as trials,
      SUM(success) as wins,
      ROUND(AVG(tokens_used), 0) as avg_tokens,
      ROUND(AVG(cost_usd), 4) as avg_cost
    FROM model_outcomes
    WHERE task_type = ? AND timestamp > ?
    GROUP BY model_id;
    
  4. Federated learning: Export performance data for cross-project analysis

    SELECT * FROM model_stats
    WHERE successes + failures >= 10  -- High-confidence entries only
    ORDER BY success_rate DESC;
    

References