# SF vs RA.Aid — Full Feature Comparison **Date**: 2026-05-07 **Scope**: Complete feature-by-feature comparison across all subsystems --- ## Executive Summary | Dimension | SF | RA.Aid | Verdict | |-----------|-----|--------|---------| | **Architecture** | TypeScript monorepo, extension-based, DB-first | Python, LangGraph agents, ORM-based | Both valid; SF more modular | | **State Model** | SQLite + JSONL dual persistence | SQLite (Peewee ORM) single source | RA.Aid simpler; SF more durable | | **Agent Stages** | UOK gates (implicit) | Explicit research → plan → implement | RA.Aid clearer stage boundaries | | **Memory** | Key facts, snippets, notes, trajectory | Key facts, snippets, notes, trajectory | **Parity** | | **Cost Tracking** | Per-unit SQLite + JSONL ledger | Per-trajectory DB records + CLI commands | RA.Aid more queryable | | **Shell Safety** | Execution policy profiles + inheritance | cowboy_mode + interactive approval | SF more granular | | **Subagents** | Full subagent system with inheritance | No subagent delegation | **SF wins** | | **Mode System** | 5 work modes × 3 run controls × 4 permission profiles × 3 model modes | --research-only, --research-and-plan-only, --hil, --chat | **SF far ahead** | | **Web UI** | Next.js TUI + headless + RPC | FastAPI server (optional) | SF more complete | | **Testing** | Vitest, 144+ tests | pytest | SF more tested | | **Observability** | Prometheus metrics + journal + audit | Trajectory DB + cost CLI | Different philosophies | | **Skills System** | `.agents/skills/` with YAML frontmatter | No skill system | **SF wins** | | **Recovery** | Crash recovery, verification retry, rethink | Fallback handler, retry with backoff | **Parity** | | **MCP** | MCP client only | No MCP | **SF wins** | --- ## 1. Architecture & State Model ### SF ``` singularity-forge/ ├── src/resources/extensions/sf/ # Core extension │ ├── uok/ # UOK kernel (safety) │ ├── auto/ # Autonomous mode state │ ├── commands/ # CLI command handlers │ ├── skills/ # Skill system │ └── metrics-central.js # Prometheus metrics ├── packages/ # npm workspaces │ ├── pi-tui/ # Terminal UI │ ├── pi-ai/ # AI provider abstraction │ └── ... ├── web/ # Next.js web UI └── .sf/ # Project-local state ├── sf.db # SQLite (schema v43) ├── runtime/ # Working files └── sessions/ # Per-session state ``` **State Philosophy**: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log. ### RA.Aid ``` ra_aid/ ├── agents/ # LangGraph agents │ ├── research_agent.py │ ├── planning_agent.py │ └── implementation_agent.py ├── database/ # Peewee ORM │ ├── models.py # Trajectory, Session, KeyFact, ... │ ├── connection.py # SQLite with WAL │ └── repositories/ # Repository pattern ├── tools/ # Tool implementations ├── prompts/ # Prompt templates └── .ra-aid/ # Project-local state └── pk.db # SQLite database ``` **State Philosophy**: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes. ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **ORM** | Raw SQLite (better-sqlite3) | Peewee (higher-level) | | **Schema Evolution** | Manual versioned migrations | Peewee migrate | | **Query Surface** | Direct SQL + tool wrappers | Repository pattern + Pydantic models | | **Session Isolation** | Per-session files in `~/.sf/sessions/` | Single DB with session_id FK | | **Cross-Process** | SQLite WAL + file-based locks | Peewee connection pooling | | **Backup/Export** | JSONL ledger + DB file | DB file only | **Verdict**: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries. --- ## 2. Agent Stage Boundaries ### SF: UOK Gate System SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has: - **UOK Kernel**: Unified Orchestration Kernel that manages unit execution - **Gates**: Pass/fail checkpoints between phases - **Work Modes**: `chat` → `plan` → `build` → `review` → `repair` → `research` - **Run Control**: `manual` → `assisted` → `autonomous` The stage boundary is implicit in the work mode + unit type combination. ### RA.Aid: Explicit Agent Pipeline ```python # Main flow in __main__.py if is_informational_query() or args.research_only: run_research_agent(...) # Stage 1 else: run_research_agent(...) # Stage 1 if not args.research_and_plan_only: run_planning_agent(...) # Stage 2 run_task_implementation_agent(...) # Stage 3 ``` Each agent is a separate LangGraph agent with its own: - Prompt template - Tool set - Memory/checkpointer - Optional expert reasoning assistance ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Stage Definition** | Work mode + unit type | Explicit agent function | | **Prompt Separation** | Single prompt with mode injection | Separate prompt per agent | | **Tool Separation** | All tools available, gated by policy | Different tools per agent | | **Memory Separation** | Shared session state | Separate MemorySaver per agent | | **Expert Consultation** | Model mode routing | Explicit reasoning_assist prompt | | **Stage Skipping** | `/mode` command | `--research-only`, `--research-and-plan-only` | **Verdict**: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about. --- ## 3. Memory System ### SF | Memory Type | Storage | Access | |-------------|---------|--------| | Key Facts | SQLite (`key_facts` table) | `get_key_facts()` / `add_key_fact()` | | Code Snippets | SQLite (`code_snippets` table) | `get_code_snippets()` | | Research Notes | SQLite (`research_notes` table) | `get_research_notes()` | | Trajectory | JSONL (`uok-audit.jsonl`) + SQLite | `uok/audit.js` | | Prompt History | JSONL (`~/.sf/agent/prompt-history.jsonl`) | `prompt-history.js` | | Work Log | SQLite (`work_log` table) | `get_work_log()` | ### RA.Aid | Memory Type | Storage | Access | |-------------|---------|--------| | Key Facts | SQLite (`key_fact` table) | `KeyFactRepository` | | Key Snippets | SQLite (`key_snippet` table) | `KeySnippetRepository` | | Research Notes | SQLite (`research_note` table) | `ResearchNoteRepository` | | Trajectory | SQLite (`trajectory` table) | `TrajectoryRepository` | | Human Input | SQLite (`human_input` table) | `HumanInputRepository` | | Work Log | SQLite (`work_log` table) | `WorkLogRepository` | | Related Files | SQLite (`related_files` table) | `RelatedFilesRepository` | ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Storage** | Mixed (SQLite + JSONL) | Unified (SQLite only) | | **Queryability** | SQL + JSONL grep | SQL only | | **Repository Pattern** | Ad hoc functions | Formal repository classes | | **Pydantic Models** | No | Yes (`TrajectoryModel`, etc.) | | **Garbage Collection** | Manual | Automatic (`garbage_collect()`) | | **Session Scoping** | Per-session files | `session_id` foreign key | **Verdict**: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly. --- ## 4. Cost Tracking ### SF ```javascript // metrics.js — per-unit cost tracking export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) { // Writes to SQLite + JSONL } // Usage: recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045); ``` - Per-unit cost in SQLite - JSONL ledger for durability - Dashboard integration via `sf cost` command - No session-level aggregation ### RA.Aid ```python # Trajectory record with cost trajectory_repo.create( tool_name="llm_call", current_cost=0.045, input_tokens=1500, output_tokens=800, record_type="model_usage" ) # Session-level aggregation session_totals = trajectory_repo.get_session_usage_totals(session_id) # Returns: {"total_cost": 1.23, "total_tokens": 45000, ...} # CLI commands: # ra-aid last-cost # Latest session # ra-aid all-costs # All sessions ``` - Per-trajectory cost in DB - SQL aggregation for session totals - Built-in CLI commands for cost queries ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Granularity** | Per-unit | Per-trajectory (finer) | | **Aggregation** | Manual | SQL SUM | | **CLI Query** | `sf cost` (basic) | `ra-aid last-cost`, `ra-aid all-costs` | | **Budget Limits** | Cost guard gate | `--max-cost`, `--max-tokens` | | **Show Cost** | TUI overlay | `--show-cost` flag | **Verdict**: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries. --- ## 5. Shell Safety & Execution Policy ### SF ```javascript // execution-policy.js const PROFILES = { restricted: { // No destructive tools allowDestructive: false, allowBash: false, allowWrite: false, }, normal: { // Read-only + planning writes allowDestructive: false, allowBash: true, // But classified commands blocked allowWrite: true, // But source mutations gated }, trusted: { // Most tools allowed allowDestructive: true, allowBash: true, allowWrite: true, }, unrestricted: { // Everything allowDestructive: true, allowBash: true, allowWrite: true, }, }; // Subagent inheritance enforces parent policy validateSubagentDispatch(envelope, proposal); ``` - 4 permission profiles - Subagent inheritance (parent → child) - Execution policy tool_call hook - Destructive command classifier ### RA.Aid ```python # tools/shell.py cowboy_mode = get_config_repository().get("cowboy_mode", False) if not cowboy_mode: response = Prompt.ask( "Execute this command? (y=yes, n=no, c=enable cowboy mode)", choices=["y", "n", "c"], default="y", ) if response == "n": return {"success": False, "output": "Cancelled"} elif response == "c": get_config_repository().set("cowboy_mode", True) ``` - Binary: cowboy_mode on/off - Interactive approval per command - No subagent delegation (no inheritance needed) ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Policy Granularity** | 4 profiles + model mode + work mode | Binary (cowboy_mode) | | **Approval UX** | Policy-driven automatic | Interactive per-command | | **Subagent Inheritance** | Full envelope propagation | N/A (no subagents) | | **Destructive Classification** | Static list + dynamic analysis | None | | **Audit Trail** | Journal + metrics | Trajectory | **Verdict**: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe. --- ## 6. Subagent System ### SF Full subagent system with: - **Modes**: single, chain, parallel, debate, background - **Inheritance**: Parent mode state propagates to children via env vars - **Validation**: Subagent dispatch blocked if it violates parent policy - **Coordination**: Parallel intent registry prevents conflicting work ```javascript // subagent-inheritance.js export function validateSubagentDispatch(envelope, proposal) { // Block if provider not allowed // Block if heavy model in fast mode // Block if destructive tools in restricted mode } ``` ### RA.Aid **No subagent system.** RA.Aid is a single-agent system. It does not dispatch child agents. ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Subagent Modes** | 5 modes | None | | **Inheritance** | Full mode envelope | N/A | | **Parallel Work** | Parallel intent registry | N/A | | **Debate Mode** | Advocate + challenger | N/A | **Verdict**: SF has a significant advantage for complex multi-agent workflows. --- ## 7. Mode System ### SF Orthogonal axes: - **Work Mode**: `chat` | `plan` | `build` | `review` | `repair` | `research` - **Run Control**: `manual` | `assisted` | `autonomous` - **Permission Profile**: `restricted` | `normal` | `trusted` | `unrestricted` - **Model Mode**: `fast` | `smart` | `deep` - **Surface**: `tui` | `web` | `headless` | `rpc` ```javascript // Direct commands /mode build /control autonomous /trust trusted /model-mode deep // TUI shortcuts Ctrl+Shift+M // Cycle work mode Ctrl+Shift+A // Autonomous Ctrl+Shift+P // Cycle permission ``` ### RA.Aid Flags: - `--research-only`: Research only, no implementation - `--research-and-plan-only`: Research + plan, then exit - `--hil`: Human-in-the-loop - `--chat`: Chat mode (implies --hil) - `--cowboy-mode`: Skip shell approval ```bash ra-aid -m "task" --research-only ra-aid -m "task" --research-and-plan-only ra-aid -m "task" --hil --chat ``` ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Work Mode** | 6 modes with transitions | 2 flags (research-only, research-and-plan-only) | | **Run Control** | 3 levels | Implicit (hil/chat vs default) | | **Permission** | 4 profiles | 1 flag (cowboy-mode) | | **Model Routing** | 3 modes (fast/smart/deep) | Per-task provider/model flags | | **Surface** | 4 surfaces | 2 (CLI, server) | | **Keyboard Shortcuts** | 8 shortcuts | None | | **Mode Persistence** | SQLite + terminal title | In-memory only | **Verdict**: SF's mode system is far more sophisticated and user-friendly. --- ## 8. Web UI ### SF - **TUI**: Terminal UI with color bands, emojis, mode badges, cost overlay - **Web**: Next.js app with real-time updates - **Headless**: JSON/JSONL output for automation - **RPC**: gRPC/JSON-RPC for external control ```bash sf tui # Terminal UI sf web # Start web server sf headless # JSON output sf rpc # RPC server ``` ### RA.Aid - **CLI**: Rich console output with panels - **Server**: FastAPI server (optional) ```bash ra-aid -m "task" # CLI ra-aid --server # FastAPI on :1818 ``` ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Terminal UI** | Full TUI with mode badges | Rich panels | | **Web Interface** | Next.js | FastAPI | | **Headless/Machine** | JSON/JSONL event stream | None | | **Real-time Updates** | WebSocket | HTTP polling | | **Multi-session** | Session manager | Single session | **Verdict**: SF has a more complete multi-surface architecture. --- ## 9. Testing ### SF - **Runner**: Vitest - **Count**: 144+ tests across 12 suites - **Coverage**: V8 provider, 40/40/20/20 thresholds - **Types**: Unit + integration + smoke + live ```bash npm test # All tests npm run test:unit # Unit only npm run test:integration # Integration npm run test:smoke # Smoke tests npm run test:live # Live tests (need env) ``` ### RA.Aid - **Runner**: pytest - **Count**: Unknown (not inspected) - **Coverage**: Unknown - **Types**: Unit tests ```bash pytest tests/ ``` ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Test Runner** | Vitest | pytest | | **Test Count** | 144+ | Unknown | | **Coverage** | Enforced in CI | Unknown | | **Integration Tests** | Yes | Unknown | | **Smoke Tests** | Yes | Unknown | | **Live Tests** | Yes | Unknown | **Verdict**: SF appears to have more comprehensive testing infrastructure. --- ## 10. Observability ### SF | System | Purpose | Format | |--------|---------|--------| | **metrics-central.js** | Aggregated metrics | Prometheus text | | **uok/audit.js** | Per-unit audit trail | JSONL | | **journal.js** | Mode transitions, decisions | SQLite | | **self-feedback.js** | Inline self-correction | SQLite | | **TUI footer** | Real-time cost/context | ANSI text | ### RA.Aid | System | Purpose | Format | |--------|---------|--------| | **Trajectory** | Universal event log | SQLite (Peewee) | | **Cost CLI** | Session cost queries | JSON | | **Work Log** | Human-readable activity | SQLite | | **Console panels** | Real-time status | Rich text | ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Metrics Format** | Prometheus | None (DB queries) | | **Event Granularity** | Per-unit + per-metric | Per-trajectory | | **Queryability** | SQL + Prometheus | SQL only | | **Dashboard Ready** | Yes (Grafana) | No | | **Real-time Display** | TUI footer | Console panels | **Verdict**: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory). --- ## 11. Skills System ### SF ```yaml # .agents/skills/my-skill/SKILL.md --- name: my-skill user-invocable: true model-invocable: true side-effects: none permission-profile: normal --- # Skill documentation... ``` - YAML frontmatter - Hierarchical discovery - Permission filtering - Work-mode relevance - Eval harness ### RA.Aid **No skill system.** RA.Aid has custom tools (`--custom-tools`) but no structured skill framework. ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Skill Definition** | YAML frontmatter | Python module | | **Discovery** | Hierarchical `.agents/skills/` | `--custom-tools` flag | | **Permissions** | Per-skill profile | None | | **Eval** | Built-in harness | None | | **Auto-creation** | Pattern detection | None | **Verdict**: SF has a significant advantage for structured skill management. --- ## 12. Recovery & Resilience ### SF | Mechanism | Purpose | |-----------|---------| | **Crash recovery** | Resume from checkpoint after failure | | **Verification retry** | Re-run failed verification gates | | **Rethink** | Inject rethink prompt on stuck detection | | **Circuit breaker** | Exponential backoff on gate failures | | **Cost guard** | Block expensive operations | | **Writer tokens** | Prevent concurrent writes | | **Parity system** | Detect and recover from drift | ### RA.Aid | Mechanism | Purpose | |-----------|---------| | **Fallback handler** | Switch to alternative models on failure | | **Retry with backoff** | Re-run failed agent invocations | | **Token limiter** | Remove old messages to prevent overflow | | **Recursion limit** | Prevent infinite loops | ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Checkpoint/Resume** | Yes | No | | **Model Fallback** | Yes (on 429/rate-limit) | Yes | | **Token Management** | No | Yes (limiter) | | **Circuit Breaker** | Yes | No | | **Cost Guard** | Yes | No (budget only) | | **Concurrent Write Prevention** | Yes (writer tokens) | No | **Verdict**: Different strengths. SF better for operational resilience; RA.Aid better for model resilience. --- ## 13. MCP Integration ### SF - **MCP Client**: Full MCP client with tool discovery, resource listing, OAuth - **MCP Server Guard**: Explicitly forbidden (test enforces this) ```javascript // No SF MCP server — client only pi.registerMcpClient("filesystem", { ... }); ``` ### RA.Aid **No MCP integration.** RA.Aid uses LangChain tools directly. ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **MCP Client** | Yes | No | | **MCP Server** | Explicitly forbidden | N/A | | **Tool Discovery** | Dynamic from MCP servers | Static tool definitions | **Verdict**: SF is ahead for MCP ecosystem integration. --- ## 14. Provider Abstraction ### SF ```javascript // pi-ai package const provider = await resolveProvider("anthropic", "claude-sonnet-4"); const response = await provider.complete(prompt, { thinking: true }); ``` - Abstract provider interface - Model mode routing (fast/smart/deep) - Temperature/thinking level management - Provider allowlists/blocklists ### RA.Aid ```python # llm.py model = initialize_llm(provider, model, temperature=temperature) response = model.invoke(prompt) ``` - LiteLLM for provider abstraction - Per-task provider/model override - Temperature support - Expert model consultation ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Abstraction Layer** | Custom (pi-ai) | LiteLLM | | **Model Routing** | Mode-based (fast/smart/deep) | Explicit flags | | **Expert Model** | No | Yes (reasoning_assist) | | **Temperature** | Yes | Yes | | **Thinking Level** | Yes | No | **Verdict**: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic. --- ## 15. Documentation & Prompt Engineering ### SF - **AGENTS.md**: Project-specific instructions - **CLAUDE.md**: Claude-specific guidance - **PDD**: Purpose-Driven Development fields - **Skills**: `.agents/skills/` with structured prompts - **Prompt History**: Per-project JSONL ### RA.Aid - **Prompt Templates**: Separate files per agent - **Expert Prompts**: Optional expert consultation - **Human Prompts**: HIL sections - **Custom Tools**: Dynamic tool injection ### Comparison | Aspect | SF | RA.Aid | |--------|-----|--------| | **Prompt Organization** | Skills + PDD | Agent-specific files | | **Expert Consultation** | Model mode routing | Explicit reasoning_assist | | **Human-in-the-loop** | Permission profiles | --hil flag | | **Custom Tools** | Skill system | --custom-tools flag | | **Prompt Versioning** | Git-tracked skills | Package-bundled | **Verdict**: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic. --- ## Overall Assessment ### SF Strengths 1. **Mode system**: 5 axes of control vs RA.Aid's binary flags 2. **Subagent system**: Full delegation with inheritance 3. **Skills system**: Structured, evaluable, discoverable 4. **MCP integration**: Client-only, ecosystem-ready 5. **Execution policy**: Granular permission profiles 6. **Observability**: Prometheus-compatible metrics 7. **Multi-surface**: TUI + web + headless + RPC ### RA.Aid Strengths 1. **Explicit pipeline**: Clear research → plan → implement flow 2. **Expert consultation**: Dynamic reasoning assistance 3. **Cost tracking**: Built-in aggregation and CLI queries 4. **Repository pattern**: Clean data access 5. ~~Fallback handling~~: SF already has model switching on 429/rate-limit 6. **Token limiting**: Prevent context overflow 7. **Simplicity**: Easier to understand and modify ### Where SF Should Borrow from RA.Aid 1. **Explicit stage boundaries**: Add `/research`, `/plan`, `/implement` commands that mirror RA.Aid's agent pipeline 2. **Expert consultation**: Add optional "expert model" for reasoning assistance before complex operations 3. **Cost CLI**: Add `sf cost --session`, `sf cost --all` commands 4. **Repository pattern**: Formalize data access with repository classes 5. **Token limiting**: Add context window management 6. ~~Fallback handler~~: SF already has model fallback on 429/rate-limit errors ### Where RA.Aid Should Borrow from SF 1. **Mode system**: Add work modes, permission profiles, model modes 2. **Subagent system**: Add delegation for parallel work 3. **Execution policy**: Replace cowboy_mode with granular profiles 4. **Skills system**: Add structured skill framework 5. **MCP integration**: Add MCP client support 6. **UOK gates**: Add safety checkpoints between stages 7. **Observability**: Add Prometheus metrics --- ## Conclusion SF and RA.Aid are complementary rather than competitive: - **SF** is a **platform**: modular, multi-surface, safety-first, designed for complex multi-agent workflows - **RA.Aid** is a **tool**: focused, simple, explicit, designed for single-agent coding tasks The ideal system would combine: - SF's mode system + subagent system + skills system - RA.Aid's explicit pipeline + expert consultation + cost tracking - Both projects' DB-first state philosophy