singularity-forge/docs/records/2026-05-07-sf-vs-ra-aid-full-comparison.md

# SF vs RA.Aid — Full Feature Comparison

**Date**: 2026-05-07
**Scope**: Complete feature-by-feature comparison across all subsystems

---

## Executive Summary

| Dimension | SF | RA.Aid | Verdict |
|-----------|-----|--------|---------|
| **Architecture** | TypeScript monorepo, extension-based, DB-first | Python, LangGraph agents, ORM-based | Both valid; SF more modular |
| **State Model** | SQLite + JSONL dual persistence | SQLite (Peewee ORM) single source | RA.Aid simpler; SF more durable |
| **Agent Stages** | UOK gates (implicit) | Explicit research → plan → implement | RA.Aid clearer stage boundaries |
| **Memory** | Key facts, snippets, notes, trajectory | Key facts, snippets, notes, trajectory | **Parity** |
| **Cost Tracking** | Per-unit SQLite + JSONL ledger | Per-trajectory DB records + CLI commands | RA.Aid more queryable |
| **Shell Safety** | Execution policy profiles + inheritance | cowboy_mode + interactive approval | SF more granular |
| **Subagents** | Full subagent system with inheritance | No subagent delegation | **SF wins** |
| **Mode System** | 5 work modes × 3 run controls × 4 permission profiles × 3 model modes | --research-only, --research-and-plan-only, --hil, --chat | **SF far ahead** |
| **Web UI** | Next.js TUI + headless + RPC | FastAPI server (optional) | SF more complete |
| **Testing** | Vitest, 144+ tests | pytest | SF more tested |
| **Observability** | Prometheus metrics + journal + audit | Trajectory DB + cost CLI | Different philosophies |
| **Skills System** | `.agents/skills/` with YAML frontmatter | No skill system | **SF wins** |
| **Recovery** | Crash recovery, verification retry, rethink | Fallback handler, retry with backoff | **Parity** |
| **MCP** | MCP client only | No MCP | **SF wins** |

---

## 1. Architecture & State Model

### SF
```
singularity-forge/
├── src/resources/extensions/sf/     # Core extension
│   ├── uok/                         # UOK kernel (safety)
│   ├── auto/                        # Autonomous mode state
│   ├── commands/                    # CLI command handlers
│   ├── skills/                      # Skill system
│   └── metrics-central.js           # Prometheus metrics
├── packages/                        # npm workspaces
│   ├── pi-tui/                      # Terminal UI
│   ├── pi-ai/                       # AI provider abstraction
│   └── ...
├── web/                             # Next.js web UI
└── .sf/                             # Project-local state
    ├── sf.db                        # SQLite (schema v43)
    ├── runtime/                     # Working files
    └── sessions/                    # Per-session state
```

**State Philosophy**: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log.

### RA.Aid
```
ra_aid/
├── agents/                          # LangGraph agents
│   ├── research_agent.py
│   ├── planning_agent.py
│   └── implementation_agent.py
├── database/                        # Peewee ORM
│   ├── models.py                    # Trajectory, Session, KeyFact, ...
│   ├── connection.py                # SQLite with WAL
│   └── repositories/                # Repository pattern
├── tools/                           # Tool implementations
├── prompts/                         # Prompt templates
└── .ra-aid/                         # Project-local state
    └── pk.db                        # SQLite database
```

**State Philosophy**: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes.

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **ORM** | Raw SQLite (better-sqlite3) | Peewee (higher-level) |
| **Schema Evolution** | Manual versioned migrations | Peewee migrate |
| **Query Surface** | Direct SQL + tool wrappers | Repository pattern + Pydantic models |
| **Session Isolation** | Per-session files in `~/.sf/sessions/` | Single DB with session_id FK |
| **Cross-Process** | SQLite WAL + file-based locks | Peewee connection pooling |
| **Backup/Export** | JSONL ledger + DB file | DB file only |

**Verdict**: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries.

---

## 2. Agent Stage Boundaries

### SF: UOK Gate System

SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has:

- **UOK Kernel**: Unified Orchestration Kernel that manages unit execution
- **Gates**: Pass/fail checkpoints between phases
- **Work Modes**: `chat` → `plan` → `build` → `review` → `repair` → `research`
- **Run Control**: `manual` → `assisted` → `autonomous`

The stage boundary is implicit in the work mode + unit type combination.

### RA.Aid: Explicit Agent Pipeline

```python
# Main flow in __main__.py
if is_informational_query() or args.research_only:
    run_research_agent(...)        # Stage 1
else:
    run_research_agent(...)        # Stage 1
    if not args.research_and_plan_only:
        run_planning_agent(...)    # Stage 2
        run_task_implementation_agent(...)  # Stage 3
```

Each agent is a separate LangGraph agent with its own:
- Prompt template
- Tool set
- Memory/checkpointer
- Optional expert reasoning assistance

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Stage Definition** | Work mode + unit type | Explicit agent function |
| **Prompt Separation** | Single prompt with mode injection | Separate prompt per agent |
| **Tool Separation** | All tools available, gated by policy | Different tools per agent |
| **Memory Separation** | Shared session state | Separate MemorySaver per agent |
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist prompt |
| **Stage Skipping** | `/mode` command | `--research-only`, `--research-and-plan-only` |

**Verdict**: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about.

---

## 3. Memory System

### SF

| Memory Type | Storage | Access |
|-------------|---------|--------|
| Key Facts | SQLite (`key_facts` table) | `get_key_facts()` / `add_key_fact()` |
| Code Snippets | SQLite (`code_snippets` table) | `get_code_snippets()` |
| Research Notes | SQLite (`research_notes` table) | `get_research_notes()` |
| Trajectory | JSONL (`uok-audit.jsonl`) + SQLite | `uok/audit.js` |
| Prompt History | JSONL (`~/.sf/agent/prompt-history.jsonl`) | `prompt-history.js` |
| Work Log | SQLite (`work_log` table) | `get_work_log()` |

### RA.Aid

| Memory Type | Storage | Access |
|-------------|---------|--------|
| Key Facts | SQLite (`key_fact` table) | `KeyFactRepository` |
| Key Snippets | SQLite (`key_snippet` table) | `KeySnippetRepository` |
| Research Notes | SQLite (`research_note` table) | `ResearchNoteRepository` |
| Trajectory | SQLite (`trajectory` table) | `TrajectoryRepository` |
| Human Input | SQLite (`human_input` table) | `HumanInputRepository` |
| Work Log | SQLite (`work_log` table) | `WorkLogRepository` |
| Related Files | SQLite (`related_files` table) | `RelatedFilesRepository` |

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Storage** | Mixed (SQLite + JSONL) | Unified (SQLite only) |
| **Queryability** | SQL + JSONL grep | SQL only |
| **Repository Pattern** | Ad hoc functions | Formal repository classes |
| **Pydantic Models** | No | Yes (`TrajectoryModel`, etc.) |
| **Garbage Collection** | Manual | Automatic (`garbage_collect()`) |
| **Session Scoping** | Per-session files | `session_id` foreign key |

**Verdict**: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly.

---

## 4. Cost Tracking

### SF

```javascript
// metrics.js — per-unit cost tracking
export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) {
  // Writes to SQLite + JSONL
}

// Usage:
recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045);
```

- Per-unit cost in SQLite
- JSONL ledger for durability
- Dashboard integration via `sf cost` command
- No session-level aggregation

### RA.Aid

```python
# Trajectory record with cost
trajectory_repo.create(
    tool_name="llm_call",
    current_cost=0.045,
    input_tokens=1500,
    output_tokens=800,
    record_type="model_usage"
)

# Session-level aggregation
session_totals = trajectory_repo.get_session_usage_totals(session_id)
# Returns: {"total_cost": 1.23, "total_tokens": 45000, ...}

# CLI commands:
#   ra-aid last-cost    # Latest session
#   ra-aid all-costs    # All sessions
```

- Per-trajectory cost in DB
- SQL aggregation for session totals
- Built-in CLI commands for cost queries

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Granularity** | Per-unit | Per-trajectory (finer) |
| **Aggregation** | Manual | SQL SUM |
| **CLI Query** | `sf cost` (basic) | `ra-aid last-cost`, `ra-aid all-costs` |
| **Budget Limits** | Cost guard gate | `--max-cost`, `--max-tokens` |
| **Show Cost** | TUI overlay | `--show-cost` flag |

**Verdict**: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries.

---

## 5. Shell Safety & Execution Policy

### SF

```javascript
// execution-policy.js
const PROFILES = {
  restricted: {  // No destructive tools
    allowDestructive: false,
    allowBash: false,
    allowWrite: false,
  },
  normal: {      // Read-only + planning writes
    allowDestructive: false,
    allowBash: true,  // But classified commands blocked
    allowWrite: true, // But source mutations gated
  },
  trusted: {     // Most tools allowed
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
  unrestricted: { // Everything
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
};

// Subagent inheritance enforces parent policy
validateSubagentDispatch(envelope, proposal);
```

- 4 permission profiles
- Subagent inheritance (parent → child)
- Execution policy tool_call hook
- Destructive command classifier

### RA.Aid

```python
# tools/shell.py
cowboy_mode = get_config_repository().get("cowboy_mode", False)

if not cowboy_mode:
    response = Prompt.ask(
        "Execute this command? (y=yes, n=no, c=enable cowboy mode)",
        choices=["y", "n", "c"],
        default="y",
    )
    if response == "n":
        return {"success": False, "output": "Cancelled"}
    elif response == "c":
        get_config_repository().set("cowboy_mode", True)
```

- Binary: cowboy_mode on/off
- Interactive approval per command
- No subagent delegation (no inheritance needed)

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Policy Granularity** | 4 profiles + model mode + work mode | Binary (cowboy_mode) |
| **Approval UX** | Policy-driven automatic | Interactive per-command |
| **Subagent Inheritance** | Full envelope propagation | N/A (no subagents) |
| **Destructive Classification** | Static list + dynamic analysis | None |
| **Audit Trail** | Journal + metrics | Trajectory |

**Verdict**: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe.

---

## 6. Subagent System

### SF

Full subagent system with:
- **Modes**: single, chain, parallel, debate, background
- **Inheritance**: Parent mode state propagates to children via env vars
- **Validation**: Subagent dispatch blocked if it violates parent policy
- **Coordination**: Parallel intent registry prevents conflicting work

```javascript
// subagent-inheritance.js
export function validateSubagentDispatch(envelope, proposal) {
  // Block if provider not allowed
  // Block if heavy model in fast mode
  // Block if destructive tools in restricted mode
}
```

### RA.Aid

**No subagent system.** RA.Aid is a single-agent system. It does not dispatch child agents.

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Subagent Modes** | 5 modes | None |
| **Inheritance** | Full mode envelope | N/A |
| **Parallel Work** | Parallel intent registry | N/A |
| **Debate Mode** | Advocate + challenger | N/A |

**Verdict**: SF has a significant advantage for complex multi-agent workflows.

---

## 7. Mode System

### SF

Orthogonal axes:
- **Work Mode**: `chat` | `plan` | `build` | `review` | `repair` | `research`
- **Run Control**: `manual` | `assisted` | `autonomous`
- **Permission Profile**: `restricted` | `normal` | `trusted` | `unrestricted`
- **Model Mode**: `fast` | `smart` | `deep`
- **Surface**: `tui` | `web` | `headless` | `rpc`

```javascript
// Direct commands
/mode build
/control autonomous
/trust trusted
/model-mode deep

// TUI shortcuts
Ctrl+Shift+M  // Cycle work mode
Ctrl+Shift+A  // Autonomous
Ctrl+Shift+P  // Cycle permission
```

### RA.Aid

Flags:
- `--research-only`: Research only, no implementation
- `--research-and-plan-only`: Research + plan, then exit
- `--hil`: Human-in-the-loop
- `--chat`: Chat mode (implies --hil)
- `--cowboy-mode`: Skip shell approval

```bash
ra-aid -m "task" --research-only
ra-aid -m "task" --research-and-plan-only
ra-aid -m "task" --hil --chat
```

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Work Mode** | 6 modes with transitions | 2 flags (research-only, research-and-plan-only) |
| **Run Control** | 3 levels | Implicit (hil/chat vs default) |
| **Permission** | 4 profiles | 1 flag (cowboy-mode) |
| **Model Routing** | 3 modes (fast/smart/deep) | Per-task provider/model flags |
| **Surface** | 4 surfaces | 2 (CLI, server) |
| **Keyboard Shortcuts** | 8 shortcuts | None |
| **Mode Persistence** | SQLite + terminal title | In-memory only |

**Verdict**: SF's mode system is far more sophisticated and user-friendly.

---

## 8. Web UI

### SF

- **TUI**: Terminal UI with color bands, emojis, mode badges, cost overlay
- **Web**: Next.js app with real-time updates
- **Headless**: JSON/JSONL output for automation
- **RPC**: gRPC/JSON-RPC for external control

```bash
sf tui          # Terminal UI
sf web          # Start web server
sf headless     # JSON output
sf rpc          # RPC server
```

### RA.Aid

- **CLI**: Rich console output with panels
- **Server**: FastAPI server (optional)

```bash
ra-aid -m "task"           # CLI
ra-aid --server            # FastAPI on :1818
```

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Terminal UI** | Full TUI with mode badges | Rich panels |
| **Web Interface** | Next.js | FastAPI |
| **Headless/Machine** | JSON/JSONL event stream | None |
| **Real-time Updates** | WebSocket | HTTP polling |
| **Multi-session** | Session manager | Single session |

**Verdict**: SF has a more complete multi-surface architecture.

---

## 9. Testing

### SF

- **Runner**: Vitest
- **Count**: 144+ tests across 12 suites
- **Coverage**: V8 provider, 40/40/20/20 thresholds
- **Types**: Unit + integration + smoke + live

```bash
npm test              # All tests
npm run test:unit     # Unit only
npm run test:integration  # Integration
npm run test:smoke    # Smoke tests
npm run test:live     # Live tests (need env)
```

### RA.Aid

- **Runner**: pytest
- **Count**: Unknown (not inspected)
- **Coverage**: Unknown
- **Types**: Unit tests

```bash
pytest tests/
```

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Test Runner** | Vitest | pytest |
| **Test Count** | 144+ | Unknown |
| **Coverage** | Enforced in CI | Unknown |
| **Integration Tests** | Yes | Unknown |
| **Smoke Tests** | Yes | Unknown |
| **Live Tests** | Yes | Unknown |

**Verdict**: SF appears to have more comprehensive testing infrastructure.

---

## 10. Observability

### SF

| System | Purpose | Format |
|--------|---------|--------|
| **metrics-central.js** | Aggregated metrics | Prometheus text |
| **uok/audit.js** | Per-unit audit trail | JSONL |
| **journal.js** | Mode transitions, decisions | SQLite |
| **self-feedback.js** | Inline self-correction | SQLite |
| **TUI footer** | Real-time cost/context | ANSI text |

### RA.Aid

| System | Purpose | Format |
|--------|---------|--------|
| **Trajectory** | Universal event log | SQLite (Peewee) |
| **Cost CLI** | Session cost queries | JSON |
| **Work Log** | Human-readable activity | SQLite |
| **Console panels** | Real-time status | Rich text |

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Metrics Format** | Prometheus | None (DB queries) |
| **Event Granularity** | Per-unit + per-metric | Per-trajectory |
| **Queryability** | SQL + Prometheus | SQL only |
| **Dashboard Ready** | Yes (Grafana) | No |
| **Real-time Display** | TUI footer | Console panels |

**Verdict**: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory).

---

## 11. Skills System

### SF

```yaml
# .agents/skills/my-skill/SKILL.md
---
name: my-skill
user-invocable: true
model-invocable: true
side-effects: none
permission-profile: normal
---
# Skill documentation...
```

- YAML frontmatter
- Hierarchical discovery
- Permission filtering
- Work-mode relevance
- Eval harness

### RA.Aid

**No skill system.** RA.Aid has custom tools (`--custom-tools`) but no structured skill framework.

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Skill Definition** | YAML frontmatter | Python module |
| **Discovery** | Hierarchical `.agents/skills/` | `--custom-tools` flag |
| **Permissions** | Per-skill profile | None |
| **Eval** | Built-in harness | None |
| **Auto-creation** | Pattern detection | None |

**Verdict**: SF has a significant advantage for structured skill management.

---

## 12. Recovery & Resilience

### SF

| Mechanism | Purpose |
|-----------|---------|
| **Crash recovery** | Resume from checkpoint after failure |
| **Verification retry** | Re-run failed verification gates |
| **Rethink** | Inject rethink prompt on stuck detection |
| **Circuit breaker** | Exponential backoff on gate failures |
| **Cost guard** | Block expensive operations |
| **Writer tokens** | Prevent concurrent writes |
| **Parity system** | Detect and recover from drift |

### RA.Aid

| Mechanism | Purpose |
|-----------|---------|
| **Fallback handler** | Switch to alternative models on failure |
| **Retry with backoff** | Re-run failed agent invocations |
| **Token limiter** | Remove old messages to prevent overflow |
| **Recursion limit** | Prevent infinite loops |

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Checkpoint/Resume** | Yes | No |
| **Model Fallback** | Yes (on 429/rate-limit) | Yes |
| **Token Management** | No | Yes (limiter) |
| **Circuit Breaker** | Yes | No |
| **Cost Guard** | Yes | No (budget only) |
| **Concurrent Write Prevention** | Yes (writer tokens) | No |

**Verdict**: Different strengths. SF better for operational resilience; RA.Aid better for model resilience.

---

## 13. MCP Integration

### SF

- **MCP Client**: Full MCP client with tool discovery, resource listing, OAuth
- **MCP Server Guard**: Explicitly forbidden (test enforces this)

```javascript
// No SF MCP server — client only
pi.registerMcpClient("filesystem", { ... });
```

### RA.Aid

**No MCP integration.** RA.Aid uses LangChain tools directly.

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **MCP Client** | Yes | No |
| **MCP Server** | Explicitly forbidden | N/A |
| **Tool Discovery** | Dynamic from MCP servers | Static tool definitions |

**Verdict**: SF is ahead for MCP ecosystem integration.

---

## 14. Provider Abstraction

### SF

```javascript
// pi-ai package
const provider = await resolveProvider("anthropic", "claude-sonnet-4");
const response = await provider.complete(prompt, { thinking: true });
```

- Abstract provider interface
- Model mode routing (fast/smart/deep)
- Temperature/thinking level management
- Provider allowlists/blocklists

### RA.Aid

```python
# llm.py
model = initialize_llm(provider, model, temperature=temperature)
response = model.invoke(prompt)
```

- LiteLLM for provider abstraction
- Per-task provider/model override
- Temperature support
- Expert model consultation

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Abstraction Layer** | Custom (pi-ai) | LiteLLM |
| **Model Routing** | Mode-based (fast/smart/deep) | Explicit flags |
| **Expert Model** | No | Yes (reasoning_assist) |
| **Temperature** | Yes | Yes |
| **Thinking Level** | Yes | No |

**Verdict**: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic.

---

## 15. Documentation & Prompt Engineering

### SF

- **AGENTS.md**: Project-specific instructions
- **CLAUDE.md**: Claude-specific guidance
- **PDD**: Purpose-Driven Development fields
- **Skills**: `.agents/skills/` with structured prompts
- **Prompt History**: Per-project JSONL

### RA.Aid

- **Prompt Templates**: Separate files per agent
- **Expert Prompts**: Optional expert consultation
- **Human Prompts**: HIL sections
- **Custom Tools**: Dynamic tool injection

### Comparison

| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Prompt Organization** | Skills + PDD | Agent-specific files |
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist |
| **Human-in-the-loop** | Permission profiles | --hil flag |
| **Custom Tools** | Skill system | --custom-tools flag |
| **Prompt Versioning** | Git-tracked skills | Package-bundled |

**Verdict**: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic.

---

## Overall Assessment

### SF Strengths
1. **Mode system**: 5 axes of control vs RA.Aid's binary flags
2. **Subagent system**: Full delegation with inheritance
3. **Skills system**: Structured, evaluable, discoverable
4. **MCP integration**: Client-only, ecosystem-ready
5. **Execution policy**: Granular permission profiles
6. **Observability**: Prometheus-compatible metrics
7. **Multi-surface**: TUI + web + headless + RPC

### RA.Aid Strengths
1. **Explicit pipeline**: Clear research → plan → implement flow
2. **Expert consultation**: Dynamic reasoning assistance
3. **Cost tracking**: Built-in aggregation and CLI queries
4. **Repository pattern**: Clean data access
5. ~~Fallback handling~~: SF already has model switching on 429/rate-limit
6. **Token limiting**: Prevent context overflow
7. **Simplicity**: Easier to understand and modify

### Where SF Should Borrow from RA.Aid

1. **Explicit stage boundaries**: Add `/research`, `/plan`, `/implement` commands that mirror RA.Aid's agent pipeline
2. **Expert consultation**: Add optional "expert model" for reasoning assistance before complex operations
3. **Cost CLI**: Add `sf cost --session`, `sf cost --all` commands
4. **Repository pattern**: Formalize data access with repository classes
5. **Token limiting**: Add context window management
6. ~~Fallback handler~~: SF already has model fallback on 429/rate-limit errors

### Where RA.Aid Should Borrow from SF

1. **Mode system**: Add work modes, permission profiles, model modes
2. **Subagent system**: Add delegation for parallel work
3. **Execution policy**: Replace cowboy_mode with granular profiles
4. **Skills system**: Add structured skill framework
5. **MCP integration**: Add MCP client support
6. **UOK gates**: Add safety checkpoints between stages
7. **Observability**: Add Prometheus metrics

---

## Conclusion

SF and RA.Aid are complementary rather than competitive:

- **SF** is a **platform**: modular, multi-surface, safety-first, designed for complex multi-agent workflows
- **RA.Aid** is a **tool**: focused, simple, explicit, designed for single-agent coding tasks

The ideal system would combine:
- SF's mode system + subagent system + skills system
- RA.Aid's explicit pipeline + expert consultation + cost tracking
- Both projects' DB-first state philosophy