singularity-forge/docs/records/2026-05-07-sf-vs-ra-aid-full-comparison.md

746 lines
23 KiB
Markdown
Raw Normal View History

# SF vs RA.Aid — Full Feature Comparison
**Date**: 2026-05-07
**Scope**: Complete feature-by-feature comparison across all subsystems
---
## Executive Summary
| Dimension | SF | RA.Aid | Verdict |
|-----------|-----|--------|---------|
| **Architecture** | TypeScript monorepo, extension-based, DB-first | Python, LangGraph agents, ORM-based | Both valid; SF more modular |
| **State Model** | SQLite + JSONL dual persistence | SQLite (Peewee ORM) single source | RA.Aid simpler; SF more durable |
| **Agent Stages** | UOK gates (implicit) | Explicit research → plan → implement | RA.Aid clearer stage boundaries |
| **Memory** | Key facts, snippets, notes, trajectory | Key facts, snippets, notes, trajectory | **Parity** |
| **Cost Tracking** | Per-unit SQLite + JSONL ledger | Per-trajectory DB records + CLI commands | RA.Aid more queryable |
| **Shell Safety** | Execution policy profiles + inheritance | cowboy_mode + interactive approval | SF more granular |
| **Subagents** | Full subagent system with inheritance | No subagent delegation | **SF wins** |
| **Mode System** | 5 work modes × 3 run controls × 4 permission profiles × 3 model modes | --research-only, --research-and-plan-only, --hil, --chat | **SF far ahead** |
| **Web UI** | Next.js TUI + headless + RPC | FastAPI server (optional) | SF more complete |
| **Testing** | Vitest, 144+ tests | pytest | SF more tested |
| **Observability** | Prometheus metrics + journal + audit | Trajectory DB + cost CLI | Different philosophies |
| **Skills System** | `.agents/skills/` with YAML frontmatter | No skill system | **SF wins** |
| **Recovery** | Crash recovery, verification retry, rethink | Fallback handler, retry with backoff | **Parity** |
| **MCP** | MCP client only | No MCP | **SF wins** |
---
## 1. Architecture & State Model
### SF
```
singularity-forge/
├── src/resources/extensions/sf/ # Core extension
│ ├── uok/ # UOK kernel (safety)
│ ├── auto/ # Autonomous mode state
│ ├── commands/ # CLI command handlers
│ ├── skills/ # Skill system
│ └── metrics-central.js # Prometheus metrics
├── packages/ # npm workspaces
│ ├── pi-tui/ # Terminal UI
│ ├── pi-ai/ # AI provider abstraction
│ └── ...
├── web/ # Next.js web UI
└── .sf/ # Project-local state
├── sf.db # SQLite (schema v43)
├── runtime/ # Working files
└── sessions/ # Per-session state
```
**State Philosophy**: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log.
### RA.Aid
```
ra_aid/
├── agents/ # LangGraph agents
│ ├── research_agent.py
│ ├── planning_agent.py
│ └── implementation_agent.py
├── database/ # Peewee ORM
│ ├── models.py # Trajectory, Session, KeyFact, ...
│ ├── connection.py # SQLite with WAL
│ └── repositories/ # Repository pattern
├── tools/ # Tool implementations
├── prompts/ # Prompt templates
└── .ra-aid/ # Project-local state
└── pk.db # SQLite database
```
**State Philosophy**: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes.
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **ORM** | Raw SQLite (better-sqlite3) | Peewee (higher-level) |
| **Schema Evolution** | Manual versioned migrations | Peewee migrate |
| **Query Surface** | Direct SQL + tool wrappers | Repository pattern + Pydantic models |
| **Session Isolation** | Per-session files in `~/.sf/sessions/` | Single DB with session_id FK |
| **Cross-Process** | SQLite WAL + file-based locks | Peewee connection pooling |
| **Backup/Export** | JSONL ledger + DB file | DB file only |
**Verdict**: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries.
---
## 2. Agent Stage Boundaries
### SF: UOK Gate System
SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has:
- **UOK Kernel**: Unified Orchestration Kernel that manages unit execution
- **Gates**: Pass/fail checkpoints between phases
- **Work Modes**: `chat``plan``build``review``repair``research`
- **Run Control**: `manual``assisted``autonomous`
The stage boundary is implicit in the work mode + unit type combination.
### RA.Aid: Explicit Agent Pipeline
```python
# Main flow in __main__.py
if is_informational_query() or args.research_only:
run_research_agent(...) # Stage 1
else:
run_research_agent(...) # Stage 1
if not args.research_and_plan_only:
run_planning_agent(...) # Stage 2
run_task_implementation_agent(...) # Stage 3
```
Each agent is a separate LangGraph agent with its own:
- Prompt template
- Tool set
- Memory/checkpointer
- Optional expert reasoning assistance
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Stage Definition** | Work mode + unit type | Explicit agent function |
| **Prompt Separation** | Single prompt with mode injection | Separate prompt per agent |
| **Tool Separation** | All tools available, gated by policy | Different tools per agent |
| **Memory Separation** | Shared session state | Separate MemorySaver per agent |
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist prompt |
| **Stage Skipping** | `/mode` command | `--research-only`, `--research-and-plan-only` |
**Verdict**: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about.
---
## 3. Memory System
### SF
| Memory Type | Storage | Access |
|-------------|---------|--------|
| Key Facts | SQLite (`key_facts` table) | `get_key_facts()` / `add_key_fact()` |
| Code Snippets | SQLite (`code_snippets` table) | `get_code_snippets()` |
| Research Notes | SQLite (`research_notes` table) | `get_research_notes()` |
| Trajectory | JSONL (`uok-audit.jsonl`) + SQLite | `uok/audit.js` |
| Prompt History | JSONL (`~/.sf/agent/prompt-history.jsonl`) | `prompt-history.js` |
| Work Log | SQLite (`work_log` table) | `get_work_log()` |
### RA.Aid
| Memory Type | Storage | Access |
|-------------|---------|--------|
| Key Facts | SQLite (`key_fact` table) | `KeyFactRepository` |
| Key Snippets | SQLite (`key_snippet` table) | `KeySnippetRepository` |
| Research Notes | SQLite (`research_note` table) | `ResearchNoteRepository` |
| Trajectory | SQLite (`trajectory` table) | `TrajectoryRepository` |
| Human Input | SQLite (`human_input` table) | `HumanInputRepository` |
| Work Log | SQLite (`work_log` table) | `WorkLogRepository` |
| Related Files | SQLite (`related_files` table) | `RelatedFilesRepository` |
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Storage** | Mixed (SQLite + JSONL) | Unified (SQLite only) |
| **Queryability** | SQL + JSONL grep | SQL only |
| **Repository Pattern** | Ad hoc functions | Formal repository classes |
| **Pydantic Models** | No | Yes (`TrajectoryModel`, etc.) |
| **Garbage Collection** | Manual | Automatic (`garbage_collect()`) |
| **Session Scoping** | Per-session files | `session_id` foreign key |
**Verdict**: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly.
---
## 4. Cost Tracking
### SF
```javascript
// metrics.js — per-unit cost tracking
export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) {
// Writes to SQLite + JSONL
}
// Usage:
recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045);
```
- Per-unit cost in SQLite
- JSONL ledger for durability
- Dashboard integration via `sf cost` command
- No session-level aggregation
### RA.Aid
```python
# Trajectory record with cost
trajectory_repo.create(
tool_name="llm_call",
current_cost=0.045,
input_tokens=1500,
output_tokens=800,
record_type="model_usage"
)
# Session-level aggregation
session_totals = trajectory_repo.get_session_usage_totals(session_id)
# Returns: {"total_cost": 1.23, "total_tokens": 45000, ...}
# CLI commands:
# ra-aid last-cost # Latest session
# ra-aid all-costs # All sessions
```
- Per-trajectory cost in DB
- SQL aggregation for session totals
- Built-in CLI commands for cost queries
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Granularity** | Per-unit | Per-trajectory (finer) |
| **Aggregation** | Manual | SQL SUM |
| **CLI Query** | `sf cost` (basic) | `ra-aid last-cost`, `ra-aid all-costs` |
| **Budget Limits** | Cost guard gate | `--max-cost`, `--max-tokens` |
| **Show Cost** | TUI overlay | `--show-cost` flag |
**Verdict**: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries.
---
## 5. Shell Safety & Execution Policy
### SF
```javascript
// execution-policy.js
const PROFILES = {
restricted: { // No destructive tools
allowDestructive: false,
allowBash: false,
allowWrite: false,
},
normal: { // Read-only + planning writes
allowDestructive: false,
allowBash: true, // But classified commands blocked
allowWrite: true, // But source mutations gated
},
trusted: { // Most tools allowed
allowDestructive: true,
allowBash: true,
allowWrite: true,
},
unrestricted: { // Everything
allowDestructive: true,
allowBash: true,
allowWrite: true,
},
};
// Subagent inheritance enforces parent policy
validateSubagentDispatch(envelope, proposal);
```
- 4 permission profiles
- Subagent inheritance (parent → child)
- Execution policy tool_call hook
- Destructive command classifier
### RA.Aid
```python
# tools/shell.py
cowboy_mode = get_config_repository().get("cowboy_mode", False)
if not cowboy_mode:
response = Prompt.ask(
"Execute this command? (y=yes, n=no, c=enable cowboy mode)",
choices=["y", "n", "c"],
default="y",
)
if response == "n":
return {"success": False, "output": "Cancelled"}
elif response == "c":
get_config_repository().set("cowboy_mode", True)
```
- Binary: cowboy_mode on/off
- Interactive approval per command
- No subagent delegation (no inheritance needed)
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Policy Granularity** | 4 profiles + model mode + work mode | Binary (cowboy_mode) |
| **Approval UX** | Policy-driven automatic | Interactive per-command |
| **Subagent Inheritance** | Full envelope propagation | N/A (no subagents) |
| **Destructive Classification** | Static list + dynamic analysis | None |
| **Audit Trail** | Journal + metrics | Trajectory |
**Verdict**: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe.
---
## 6. Subagent System
### SF
Full subagent system with:
- **Modes**: single, chain, parallel, debate, background
- **Inheritance**: Parent mode state propagates to children via env vars
- **Validation**: Subagent dispatch blocked if it violates parent policy
- **Coordination**: Parallel intent registry prevents conflicting work
```javascript
// subagent-inheritance.js
export function validateSubagentDispatch(envelope, proposal) {
// Block if provider not allowed
// Block if heavy model in fast mode
// Block if destructive tools in restricted mode
}
```
### RA.Aid
**No subagent system.** RA.Aid is a single-agent system. It does not dispatch child agents.
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Subagent Modes** | 5 modes | None |
| **Inheritance** | Full mode envelope | N/A |
| **Parallel Work** | Parallel intent registry | N/A |
| **Debate Mode** | Advocate + challenger | N/A |
**Verdict**: SF has a significant advantage for complex multi-agent workflows.
---
## 7. Mode System
### SF
Orthogonal axes:
- **Work Mode**: `chat` | `plan` | `build` | `review` | `repair` | `research`
- **Run Control**: `manual` | `assisted` | `autonomous`
- **Permission Profile**: `restricted` | `normal` | `trusted` | `unrestricted`
- **Model Mode**: `fast` | `smart` | `deep`
- **Surface**: `tui` | `web` | `headless` | `rpc`
```javascript
// Direct commands
/mode build
/control autonomous
/trust trusted
/model-mode deep
// TUI shortcuts
Ctrl+Shift+M // Cycle work mode
Ctrl+Shift+A // Autonomous
Ctrl+Shift+P // Cycle permission
```
### RA.Aid
Flags:
- `--research-only`: Research only, no implementation
- `--research-and-plan-only`: Research + plan, then exit
- `--hil`: Human-in-the-loop
- `--chat`: Chat mode (implies --hil)
- `--cowboy-mode`: Skip shell approval
```bash
ra-aid -m "task" --research-only
ra-aid -m "task" --research-and-plan-only
ra-aid -m "task" --hil --chat
```
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Work Mode** | 6 modes with transitions | 2 flags (research-only, research-and-plan-only) |
| **Run Control** | 3 levels | Implicit (hil/chat vs default) |
| **Permission** | 4 profiles | 1 flag (cowboy-mode) |
| **Model Routing** | 3 modes (fast/smart/deep) | Per-task provider/model flags |
| **Surface** | 4 surfaces | 2 (CLI, server) |
| **Keyboard Shortcuts** | 8 shortcuts | None |
| **Mode Persistence** | SQLite + terminal title | In-memory only |
**Verdict**: SF's mode system is far more sophisticated and user-friendly.
---
## 8. Web UI
### SF
- **TUI**: Terminal UI with color bands, emojis, mode badges, cost overlay
- **Web**: Next.js app with real-time updates
- **Headless**: JSON/JSONL output for automation
- **RPC**: gRPC/JSON-RPC for external control
```bash
sf tui # Terminal UI
sf web # Start web server
sf headless # JSON output
sf rpc # RPC server
```
### RA.Aid
- **CLI**: Rich console output with panels
- **Server**: FastAPI server (optional)
```bash
ra-aid -m "task" # CLI
ra-aid --server # FastAPI on :1818
```
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Terminal UI** | Full TUI with mode badges | Rich panels |
| **Web Interface** | Next.js | FastAPI |
| **Headless/Machine** | JSON/JSONL event stream | None |
| **Real-time Updates** | WebSocket | HTTP polling |
| **Multi-session** | Session manager | Single session |
**Verdict**: SF has a more complete multi-surface architecture.
---
## 9. Testing
### SF
- **Runner**: Vitest
- **Count**: 144+ tests across 12 suites
- **Coverage**: V8 provider, 40/40/20/20 thresholds
- **Types**: Unit + integration + smoke + live
```bash
npm test # All tests
npm run test:unit # Unit only
npm run test:integration # Integration
npm run test:smoke # Smoke tests
npm run test:live # Live tests (need env)
```
### RA.Aid
- **Runner**: pytest
- **Count**: Unknown (not inspected)
- **Coverage**: Unknown
- **Types**: Unit tests
```bash
pytest tests/
```
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Test Runner** | Vitest | pytest |
| **Test Count** | 144+ | Unknown |
| **Coverage** | Enforced in CI | Unknown |
| **Integration Tests** | Yes | Unknown |
| **Smoke Tests** | Yes | Unknown |
| **Live Tests** | Yes | Unknown |
**Verdict**: SF appears to have more comprehensive testing infrastructure.
---
## 10. Observability
### SF
| System | Purpose | Format |
|--------|---------|--------|
| **metrics-central.js** | Aggregated metrics | Prometheus text |
| **uok/audit.js** | Per-unit audit trail | JSONL |
| **journal.js** | Mode transitions, decisions | SQLite |
| **self-feedback.js** | Inline self-correction | SQLite |
| **TUI footer** | Real-time cost/context | ANSI text |
### RA.Aid
| System | Purpose | Format |
|--------|---------|--------|
| **Trajectory** | Universal event log | SQLite (Peewee) |
| **Cost CLI** | Session cost queries | JSON |
| **Work Log** | Human-readable activity | SQLite |
| **Console panels** | Real-time status | Rich text |
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Metrics Format** | Prometheus | None (DB queries) |
| **Event Granularity** | Per-unit + per-metric | Per-trajectory |
| **Queryability** | SQL + Prometheus | SQL only |
| **Dashboard Ready** | Yes (Grafana) | No |
| **Real-time Display** | TUI footer | Console panels |
**Verdict**: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory).
---
## 11. Skills System
### SF
```yaml
# .agents/skills/my-skill/SKILL.md
---
name: my-skill
user-invocable: true
model-invocable: true
side-effects: none
permission-profile: normal
---
# Skill documentation...
```
- YAML frontmatter
- Hierarchical discovery
- Permission filtering
- Work-mode relevance
- Eval harness
### RA.Aid
**No skill system.** RA.Aid has custom tools (`--custom-tools`) but no structured skill framework.
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Skill Definition** | YAML frontmatter | Python module |
| **Discovery** | Hierarchical `.agents/skills/` | `--custom-tools` flag |
| **Permissions** | Per-skill profile | None |
| **Eval** | Built-in harness | None |
| **Auto-creation** | Pattern detection | None |
**Verdict**: SF has a significant advantage for structured skill management.
---
## 12. Recovery & Resilience
### SF
| Mechanism | Purpose |
|-----------|---------|
| **Crash recovery** | Resume from checkpoint after failure |
| **Verification retry** | Re-run failed verification gates |
| **Rethink** | Inject rethink prompt on stuck detection |
| **Circuit breaker** | Exponential backoff on gate failures |
| **Cost guard** | Block expensive operations |
| **Writer tokens** | Prevent concurrent writes |
| **Parity system** | Detect and recover from drift |
### RA.Aid
| Mechanism | Purpose |
|-----------|---------|
| **Fallback handler** | Switch to alternative models on failure |
| **Retry with backoff** | Re-run failed agent invocations |
| **Token limiter** | Remove old messages to prevent overflow |
| **Recursion limit** | Prevent infinite loops |
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Checkpoint/Resume** | Yes | No |
| **Model Fallback** | Yes (on 429/rate-limit) | Yes |
| **Token Management** | No | Yes (limiter) |
| **Circuit Breaker** | Yes | No |
| **Cost Guard** | Yes | No (budget only) |
| **Concurrent Write Prevention** | Yes (writer tokens) | No |
**Verdict**: Different strengths. SF better for operational resilience; RA.Aid better for model resilience.
---
## 13. MCP Integration
### SF
- **MCP Client**: Full MCP client with tool discovery, resource listing, OAuth
- **MCP Server Guard**: Explicitly forbidden (test enforces this)
```javascript
// No SF MCP server — client only
pi.registerMcpClient("filesystem", { ... });
```
### RA.Aid
**No MCP integration.** RA.Aid uses LangChain tools directly.
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **MCP Client** | Yes | No |
| **MCP Server** | Explicitly forbidden | N/A |
| **Tool Discovery** | Dynamic from MCP servers | Static tool definitions |
**Verdict**: SF is ahead for MCP ecosystem integration.
---
## 14. Provider Abstraction
### SF
```javascript
// pi-ai package
const provider = await resolveProvider("anthropic", "claude-sonnet-4");
const response = await provider.complete(prompt, { thinking: true });
```
- Abstract provider interface
- Model mode routing (fast/smart/deep)
- Temperature/thinking level management
- Provider allowlists/blocklists
### RA.Aid
```python
# llm.py
model = initialize_llm(provider, model, temperature=temperature)
response = model.invoke(prompt)
```
- LiteLLM for provider abstraction
- Per-task provider/model override
- Temperature support
- Expert model consultation
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Abstraction Layer** | Custom (pi-ai) | LiteLLM |
| **Model Routing** | Mode-based (fast/smart/deep) | Explicit flags |
| **Expert Model** | No | Yes (reasoning_assist) |
| **Temperature** | Yes | Yes |
| **Thinking Level** | Yes | No |
**Verdict**: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic.
---
## 15. Documentation & Prompt Engineering
### SF
- **AGENTS.md**: Project-specific instructions
- **CLAUDE.md**: Claude-specific guidance
- **PDD**: Purpose-Driven Development fields
- **Skills**: `.agents/skills/` with structured prompts
- **Prompt History**: Per-project JSONL
### RA.Aid
- **Prompt Templates**: Separate files per agent
- **Expert Prompts**: Optional expert consultation
- **Human Prompts**: HIL sections
- **Custom Tools**: Dynamic tool injection
### Comparison
| Aspect | SF | RA.Aid |
|--------|-----|--------|
| **Prompt Organization** | Skills + PDD | Agent-specific files |
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist |
| **Human-in-the-loop** | Permission profiles | --hil flag |
| **Custom Tools** | Skill system | --custom-tools flag |
| **Prompt Versioning** | Git-tracked skills | Package-bundled |
**Verdict**: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic.
---
## Overall Assessment
### SF Strengths
1. **Mode system**: 5 axes of control vs RA.Aid's binary flags
2. **Subagent system**: Full delegation with inheritance
3. **Skills system**: Structured, evaluable, discoverable
4. **MCP integration**: Client-only, ecosystem-ready
5. **Execution policy**: Granular permission profiles
6. **Observability**: Prometheus-compatible metrics
7. **Multi-surface**: TUI + web + headless + RPC
### RA.Aid Strengths
1. **Explicit pipeline**: Clear research → plan → implement flow
2. **Expert consultation**: Dynamic reasoning assistance
3. **Cost tracking**: Built-in aggregation and CLI queries
4. **Repository pattern**: Clean data access
5. ~~Fallback handling~~: SF already has model switching on 429/rate-limit
6. **Token limiting**: Prevent context overflow
7. **Simplicity**: Easier to understand and modify
### Where SF Should Borrow from RA.Aid
1. **Explicit stage boundaries**: Add `/research`, `/plan`, `/implement` commands that mirror RA.Aid's agent pipeline
2. **Expert consultation**: Add optional "expert model" for reasoning assistance before complex operations
3. **Cost CLI**: Add `sf cost --session`, `sf cost --all` commands
4. **Repository pattern**: Formalize data access with repository classes
5. **Token limiting**: Add context window management
6. ~~Fallback handler~~: SF already has model fallback on 429/rate-limit errors
### Where RA.Aid Should Borrow from SF
1. **Mode system**: Add work modes, permission profiles, model modes
2. **Subagent system**: Add delegation for parallel work
3. **Execution policy**: Replace cowboy_mode with granular profiles
4. **Skills system**: Add structured skill framework
5. **MCP integration**: Add MCP client support
6. **UOK gates**: Add safety checkpoints between stages
7. **Observability**: Add Prometheus metrics
---
## Conclusion
SF and RA.Aid are complementary rather than competitive:
- **SF** is a **platform**: modular, multi-surface, safety-first, designed for complex multi-agent workflows
- **RA.Aid** is a **tool**: focused, simple, explicit, designed for single-agent coding tasks
The ideal system would combine:
- SF's mode system + subagent system + skills system
- RA.Aid's explicit pipeline + expert consultation + cost tracking
- Both projects' DB-first state philosophy