746 lines
23 KiB
Markdown
746 lines
23 KiB
Markdown
|
|
# SF vs RA.Aid — Full Feature Comparison
|
|||
|
|
|
|||
|
|
**Date**: 2026-05-07
|
|||
|
|
**Scope**: Complete feature-by-feature comparison across all subsystems
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Executive Summary
|
|||
|
|
|
|||
|
|
| Dimension | SF | RA.Aid | Verdict |
|
|||
|
|
|-----------|-----|--------|---------|
|
|||
|
|
| **Architecture** | TypeScript monorepo, extension-based, DB-first | Python, LangGraph agents, ORM-based | Both valid; SF more modular |
|
|||
|
|
| **State Model** | SQLite + JSONL dual persistence | SQLite (Peewee ORM) single source | RA.Aid simpler; SF more durable |
|
|||
|
|
| **Agent Stages** | UOK gates (implicit) | Explicit research → plan → implement | RA.Aid clearer stage boundaries |
|
|||
|
|
| **Memory** | Key facts, snippets, notes, trajectory | Key facts, snippets, notes, trajectory | **Parity** |
|
|||
|
|
| **Cost Tracking** | Per-unit SQLite + JSONL ledger | Per-trajectory DB records + CLI commands | RA.Aid more queryable |
|
|||
|
|
| **Shell Safety** | Execution policy profiles + inheritance | cowboy_mode + interactive approval | SF more granular |
|
|||
|
|
| **Subagents** | Full subagent system with inheritance | No subagent delegation | **SF wins** |
|
|||
|
|
| **Mode System** | 5 work modes × 3 run controls × 4 permission profiles × 3 model modes | --research-only, --research-and-plan-only, --hil, --chat | **SF far ahead** |
|
|||
|
|
| **Web UI** | Next.js TUI + headless + RPC | FastAPI server (optional) | SF more complete |
|
|||
|
|
| **Testing** | Vitest, 144+ tests | pytest | SF more tested |
|
|||
|
|
| **Observability** | Prometheus metrics + journal + audit | Trajectory DB + cost CLI | Different philosophies |
|
|||
|
|
| **Skills System** | `.agents/skills/` with YAML frontmatter | No skill system | **SF wins** |
|
|||
|
|
| **Recovery** | Crash recovery, verification retry, rethink | Fallback handler, retry with backoff | **Parity** |
|
|||
|
|
| **MCP** | MCP client only | No MCP | **SF wins** |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Architecture & State Model
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
```
|
|||
|
|
singularity-forge/
|
|||
|
|
├── src/resources/extensions/sf/ # Core extension
|
|||
|
|
│ ├── uok/ # UOK kernel (safety)
|
|||
|
|
│ ├── auto/ # Autonomous mode state
|
|||
|
|
│ ├── commands/ # CLI command handlers
|
|||
|
|
│ ├── skills/ # Skill system
|
|||
|
|
│ └── metrics-central.js # Prometheus metrics
|
|||
|
|
├── packages/ # npm workspaces
|
|||
|
|
│ ├── pi-tui/ # Terminal UI
|
|||
|
|
│ ├── pi-ai/ # AI provider abstraction
|
|||
|
|
│ └── ...
|
|||
|
|
├── web/ # Next.js web UI
|
|||
|
|
└── .sf/ # Project-local state
|
|||
|
|
├── sf.db # SQLite (schema v43)
|
|||
|
|
├── runtime/ # Working files
|
|||
|
|
└── sessions/ # Per-session state
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**State Philosophy**: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log.
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
```
|
|||
|
|
ra_aid/
|
|||
|
|
├── agents/ # LangGraph agents
|
|||
|
|
│ ├── research_agent.py
|
|||
|
|
│ ├── planning_agent.py
|
|||
|
|
│ └── implementation_agent.py
|
|||
|
|
├── database/ # Peewee ORM
|
|||
|
|
│ ├── models.py # Trajectory, Session, KeyFact, ...
|
|||
|
|
│ ├── connection.py # SQLite with WAL
|
|||
|
|
│ └── repositories/ # Repository pattern
|
|||
|
|
├── tools/ # Tool implementations
|
|||
|
|
├── prompts/ # Prompt templates
|
|||
|
|
└── .ra-aid/ # Project-local state
|
|||
|
|
└── pk.db # SQLite database
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**State Philosophy**: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes.
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **ORM** | Raw SQLite (better-sqlite3) | Peewee (higher-level) |
|
|||
|
|
| **Schema Evolution** | Manual versioned migrations | Peewee migrate |
|
|||
|
|
| **Query Surface** | Direct SQL + tool wrappers | Repository pattern + Pydantic models |
|
|||
|
|
| **Session Isolation** | Per-session files in `~/.sf/sessions/` | Single DB with session_id FK |
|
|||
|
|
| **Cross-Process** | SQLite WAL + file-based locks | Peewee connection pooling |
|
|||
|
|
| **Backup/Export** | JSONL ledger + DB file | DB file only |
|
|||
|
|
|
|||
|
|
**Verdict**: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Agent Stage Boundaries
|
|||
|
|
|
|||
|
|
### SF: UOK Gate System
|
|||
|
|
|
|||
|
|
SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has:
|
|||
|
|
|
|||
|
|
- **UOK Kernel**: Unified Orchestration Kernel that manages unit execution
|
|||
|
|
- **Gates**: Pass/fail checkpoints between phases
|
|||
|
|
- **Work Modes**: `chat` → `plan` → `build` → `review` → `repair` → `research`
|
|||
|
|
- **Run Control**: `manual` → `assisted` → `autonomous`
|
|||
|
|
|
|||
|
|
The stage boundary is implicit in the work mode + unit type combination.
|
|||
|
|
|
|||
|
|
### RA.Aid: Explicit Agent Pipeline
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Main flow in __main__.py
|
|||
|
|
if is_informational_query() or args.research_only:
|
|||
|
|
run_research_agent(...) # Stage 1
|
|||
|
|
else:
|
|||
|
|
run_research_agent(...) # Stage 1
|
|||
|
|
if not args.research_and_plan_only:
|
|||
|
|
run_planning_agent(...) # Stage 2
|
|||
|
|
run_task_implementation_agent(...) # Stage 3
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Each agent is a separate LangGraph agent with its own:
|
|||
|
|
- Prompt template
|
|||
|
|
- Tool set
|
|||
|
|
- Memory/checkpointer
|
|||
|
|
- Optional expert reasoning assistance
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Stage Definition** | Work mode + unit type | Explicit agent function |
|
|||
|
|
| **Prompt Separation** | Single prompt with mode injection | Separate prompt per agent |
|
|||
|
|
| **Tool Separation** | All tools available, gated by policy | Different tools per agent |
|
|||
|
|
| **Memory Separation** | Shared session state | Separate MemorySaver per agent |
|
|||
|
|
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist prompt |
|
|||
|
|
| **Stage Skipping** | `/mode` command | `--research-only`, `--research-and-plan-only` |
|
|||
|
|
|
|||
|
|
**Verdict**: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Memory System
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
| Memory Type | Storage | Access |
|
|||
|
|
|-------------|---------|--------|
|
|||
|
|
| Key Facts | SQLite (`key_facts` table) | `get_key_facts()` / `add_key_fact()` |
|
|||
|
|
| Code Snippets | SQLite (`code_snippets` table) | `get_code_snippets()` |
|
|||
|
|
| Research Notes | SQLite (`research_notes` table) | `get_research_notes()` |
|
|||
|
|
| Trajectory | JSONL (`uok-audit.jsonl`) + SQLite | `uok/audit.js` |
|
|||
|
|
| Prompt History | JSONL (`~/.sf/agent/prompt-history.jsonl`) | `prompt-history.js` |
|
|||
|
|
| Work Log | SQLite (`work_log` table) | `get_work_log()` |
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
| Memory Type | Storage | Access |
|
|||
|
|
|-------------|---------|--------|
|
|||
|
|
| Key Facts | SQLite (`key_fact` table) | `KeyFactRepository` |
|
|||
|
|
| Key Snippets | SQLite (`key_snippet` table) | `KeySnippetRepository` |
|
|||
|
|
| Research Notes | SQLite (`research_note` table) | `ResearchNoteRepository` |
|
|||
|
|
| Trajectory | SQLite (`trajectory` table) | `TrajectoryRepository` |
|
|||
|
|
| Human Input | SQLite (`human_input` table) | `HumanInputRepository` |
|
|||
|
|
| Work Log | SQLite (`work_log` table) | `WorkLogRepository` |
|
|||
|
|
| Related Files | SQLite (`related_files` table) | `RelatedFilesRepository` |
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Storage** | Mixed (SQLite + JSONL) | Unified (SQLite only) |
|
|||
|
|
| **Queryability** | SQL + JSONL grep | SQL only |
|
|||
|
|
| **Repository Pattern** | Ad hoc functions | Formal repository classes |
|
|||
|
|
| **Pydantic Models** | No | Yes (`TrajectoryModel`, etc.) |
|
|||
|
|
| **Garbage Collection** | Manual | Automatic (`garbage_collect()`) |
|
|||
|
|
| **Session Scoping** | Per-session files | `session_id` foreign key |
|
|||
|
|
|
|||
|
|
**Verdict**: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Cost Tracking
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// metrics.js — per-unit cost tracking
|
|||
|
|
export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) {
|
|||
|
|
// Writes to SQLite + JSONL
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Usage:
|
|||
|
|
recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Per-unit cost in SQLite
|
|||
|
|
- JSONL ledger for durability
|
|||
|
|
- Dashboard integration via `sf cost` command
|
|||
|
|
- No session-level aggregation
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Trajectory record with cost
|
|||
|
|
trajectory_repo.create(
|
|||
|
|
tool_name="llm_call",
|
|||
|
|
current_cost=0.045,
|
|||
|
|
input_tokens=1500,
|
|||
|
|
output_tokens=800,
|
|||
|
|
record_type="model_usage"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Session-level aggregation
|
|||
|
|
session_totals = trajectory_repo.get_session_usage_totals(session_id)
|
|||
|
|
# Returns: {"total_cost": 1.23, "total_tokens": 45000, ...}
|
|||
|
|
|
|||
|
|
# CLI commands:
|
|||
|
|
# ra-aid last-cost # Latest session
|
|||
|
|
# ra-aid all-costs # All sessions
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Per-trajectory cost in DB
|
|||
|
|
- SQL aggregation for session totals
|
|||
|
|
- Built-in CLI commands for cost queries
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Granularity** | Per-unit | Per-trajectory (finer) |
|
|||
|
|
| **Aggregation** | Manual | SQL SUM |
|
|||
|
|
| **CLI Query** | `sf cost` (basic) | `ra-aid last-cost`, `ra-aid all-costs` |
|
|||
|
|
| **Budget Limits** | Cost guard gate | `--max-cost`, `--max-tokens` |
|
|||
|
|
| **Show Cost** | TUI overlay | `--show-cost` flag |
|
|||
|
|
|
|||
|
|
**Verdict**: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Shell Safety & Execution Policy
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// execution-policy.js
|
|||
|
|
const PROFILES = {
|
|||
|
|
restricted: { // No destructive tools
|
|||
|
|
allowDestructive: false,
|
|||
|
|
allowBash: false,
|
|||
|
|
allowWrite: false,
|
|||
|
|
},
|
|||
|
|
normal: { // Read-only + planning writes
|
|||
|
|
allowDestructive: false,
|
|||
|
|
allowBash: true, // But classified commands blocked
|
|||
|
|
allowWrite: true, // But source mutations gated
|
|||
|
|
},
|
|||
|
|
trusted: { // Most tools allowed
|
|||
|
|
allowDestructive: true,
|
|||
|
|
allowBash: true,
|
|||
|
|
allowWrite: true,
|
|||
|
|
},
|
|||
|
|
unrestricted: { // Everything
|
|||
|
|
allowDestructive: true,
|
|||
|
|
allowBash: true,
|
|||
|
|
allowWrite: true,
|
|||
|
|
},
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
// Subagent inheritance enforces parent policy
|
|||
|
|
validateSubagentDispatch(envelope, proposal);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 4 permission profiles
|
|||
|
|
- Subagent inheritance (parent → child)
|
|||
|
|
- Execution policy tool_call hook
|
|||
|
|
- Destructive command classifier
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tools/shell.py
|
|||
|
|
cowboy_mode = get_config_repository().get("cowboy_mode", False)
|
|||
|
|
|
|||
|
|
if not cowboy_mode:
|
|||
|
|
response = Prompt.ask(
|
|||
|
|
"Execute this command? (y=yes, n=no, c=enable cowboy mode)",
|
|||
|
|
choices=["y", "n", "c"],
|
|||
|
|
default="y",
|
|||
|
|
)
|
|||
|
|
if response == "n":
|
|||
|
|
return {"success": False, "output": "Cancelled"}
|
|||
|
|
elif response == "c":
|
|||
|
|
get_config_repository().set("cowboy_mode", True)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Binary: cowboy_mode on/off
|
|||
|
|
- Interactive approval per command
|
|||
|
|
- No subagent delegation (no inheritance needed)
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Policy Granularity** | 4 profiles + model mode + work mode | Binary (cowboy_mode) |
|
|||
|
|
| **Approval UX** | Policy-driven automatic | Interactive per-command |
|
|||
|
|
| **Subagent Inheritance** | Full envelope propagation | N/A (no subagents) |
|
|||
|
|
| **Destructive Classification** | Static list + dynamic analysis | None |
|
|||
|
|
| **Audit Trail** | Journal + metrics | Trajectory |
|
|||
|
|
|
|||
|
|
**Verdict**: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Subagent System
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
Full subagent system with:
|
|||
|
|
- **Modes**: single, chain, parallel, debate, background
|
|||
|
|
- **Inheritance**: Parent mode state propagates to children via env vars
|
|||
|
|
- **Validation**: Subagent dispatch blocked if it violates parent policy
|
|||
|
|
- **Coordination**: Parallel intent registry prevents conflicting work
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// subagent-inheritance.js
|
|||
|
|
export function validateSubagentDispatch(envelope, proposal) {
|
|||
|
|
// Block if provider not allowed
|
|||
|
|
// Block if heavy model in fast mode
|
|||
|
|
// Block if destructive tools in restricted mode
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
**No subagent system.** RA.Aid is a single-agent system. It does not dispatch child agents.
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Subagent Modes** | 5 modes | None |
|
|||
|
|
| **Inheritance** | Full mode envelope | N/A |
|
|||
|
|
| **Parallel Work** | Parallel intent registry | N/A |
|
|||
|
|
| **Debate Mode** | Advocate + challenger | N/A |
|
|||
|
|
|
|||
|
|
**Verdict**: SF has a significant advantage for complex multi-agent workflows.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Mode System
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
Orthogonal axes:
|
|||
|
|
- **Work Mode**: `chat` | `plan` | `build` | `review` | `repair` | `research`
|
|||
|
|
- **Run Control**: `manual` | `assisted` | `autonomous`
|
|||
|
|
- **Permission Profile**: `restricted` | `normal` | `trusted` | `unrestricted`
|
|||
|
|
- **Model Mode**: `fast` | `smart` | `deep`
|
|||
|
|
- **Surface**: `tui` | `web` | `headless` | `rpc`
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// Direct commands
|
|||
|
|
/mode build
|
|||
|
|
/control autonomous
|
|||
|
|
/trust trusted
|
|||
|
|
/model-mode deep
|
|||
|
|
|
|||
|
|
// TUI shortcuts
|
|||
|
|
Ctrl+Shift+M // Cycle work mode
|
|||
|
|
Ctrl+Shift+A // Autonomous
|
|||
|
|
Ctrl+Shift+P // Cycle permission
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
Flags:
|
|||
|
|
- `--research-only`: Research only, no implementation
|
|||
|
|
- `--research-and-plan-only`: Research + plan, then exit
|
|||
|
|
- `--hil`: Human-in-the-loop
|
|||
|
|
- `--chat`: Chat mode (implies --hil)
|
|||
|
|
- `--cowboy-mode`: Skip shell approval
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ra-aid -m "task" --research-only
|
|||
|
|
ra-aid -m "task" --research-and-plan-only
|
|||
|
|
ra-aid -m "task" --hil --chat
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Work Mode** | 6 modes with transitions | 2 flags (research-only, research-and-plan-only) |
|
|||
|
|
| **Run Control** | 3 levels | Implicit (hil/chat vs default) |
|
|||
|
|
| **Permission** | 4 profiles | 1 flag (cowboy-mode) |
|
|||
|
|
| **Model Routing** | 3 modes (fast/smart/deep) | Per-task provider/model flags |
|
|||
|
|
| **Surface** | 4 surfaces | 2 (CLI, server) |
|
|||
|
|
| **Keyboard Shortcuts** | 8 shortcuts | None |
|
|||
|
|
| **Mode Persistence** | SQLite + terminal title | In-memory only |
|
|||
|
|
|
|||
|
|
**Verdict**: SF's mode system is far more sophisticated and user-friendly.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Web UI
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
- **TUI**: Terminal UI with color bands, emojis, mode badges, cost overlay
|
|||
|
|
- **Web**: Next.js app with real-time updates
|
|||
|
|
- **Headless**: JSON/JSONL output for automation
|
|||
|
|
- **RPC**: gRPC/JSON-RPC for external control
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
sf tui # Terminal UI
|
|||
|
|
sf web # Start web server
|
|||
|
|
sf headless # JSON output
|
|||
|
|
sf rpc # RPC server
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
- **CLI**: Rich console output with panels
|
|||
|
|
- **Server**: FastAPI server (optional)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
ra-aid -m "task" # CLI
|
|||
|
|
ra-aid --server # FastAPI on :1818
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Terminal UI** | Full TUI with mode badges | Rich panels |
|
|||
|
|
| **Web Interface** | Next.js | FastAPI |
|
|||
|
|
| **Headless/Machine** | JSON/JSONL event stream | None |
|
|||
|
|
| **Real-time Updates** | WebSocket | HTTP polling |
|
|||
|
|
| **Multi-session** | Session manager | Single session |
|
|||
|
|
|
|||
|
|
**Verdict**: SF has a more complete multi-surface architecture.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Testing
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
- **Runner**: Vitest
|
|||
|
|
- **Count**: 144+ tests across 12 suites
|
|||
|
|
- **Coverage**: V8 provider, 40/40/20/20 thresholds
|
|||
|
|
- **Types**: Unit + integration + smoke + live
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npm test # All tests
|
|||
|
|
npm run test:unit # Unit only
|
|||
|
|
npm run test:integration # Integration
|
|||
|
|
npm run test:smoke # Smoke tests
|
|||
|
|
npm run test:live # Live tests (need env)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
- **Runner**: pytest
|
|||
|
|
- **Count**: Unknown (not inspected)
|
|||
|
|
- **Coverage**: Unknown
|
|||
|
|
- **Types**: Unit tests
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pytest tests/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Test Runner** | Vitest | pytest |
|
|||
|
|
| **Test Count** | 144+ | Unknown |
|
|||
|
|
| **Coverage** | Enforced in CI | Unknown |
|
|||
|
|
| **Integration Tests** | Yes | Unknown |
|
|||
|
|
| **Smoke Tests** | Yes | Unknown |
|
|||
|
|
| **Live Tests** | Yes | Unknown |
|
|||
|
|
|
|||
|
|
**Verdict**: SF appears to have more comprehensive testing infrastructure.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Observability
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
| System | Purpose | Format |
|
|||
|
|
|--------|---------|--------|
|
|||
|
|
| **metrics-central.js** | Aggregated metrics | Prometheus text |
|
|||
|
|
| **uok/audit.js** | Per-unit audit trail | JSONL |
|
|||
|
|
| **journal.js** | Mode transitions, decisions | SQLite |
|
|||
|
|
| **self-feedback.js** | Inline self-correction | SQLite |
|
|||
|
|
| **TUI footer** | Real-time cost/context | ANSI text |
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
| System | Purpose | Format |
|
|||
|
|
|--------|---------|--------|
|
|||
|
|
| **Trajectory** | Universal event log | SQLite (Peewee) |
|
|||
|
|
| **Cost CLI** | Session cost queries | JSON |
|
|||
|
|
| **Work Log** | Human-readable activity | SQLite |
|
|||
|
|
| **Console panels** | Real-time status | Rich text |
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Metrics Format** | Prometheus | None (DB queries) |
|
|||
|
|
| **Event Granularity** | Per-unit + per-metric | Per-trajectory |
|
|||
|
|
| **Queryability** | SQL + Prometheus | SQL only |
|
|||
|
|
| **Dashboard Ready** | Yes (Grafana) | No |
|
|||
|
|
| **Real-time Display** | TUI footer | Console panels |
|
|||
|
|
|
|||
|
|
**Verdict**: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Skills System
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# .agents/skills/my-skill/SKILL.md
|
|||
|
|
---
|
|||
|
|
name: my-skill
|
|||
|
|
user-invocable: true
|
|||
|
|
model-invocable: true
|
|||
|
|
side-effects: none
|
|||
|
|
permission-profile: normal
|
|||
|
|
---
|
|||
|
|
# Skill documentation...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- YAML frontmatter
|
|||
|
|
- Hierarchical discovery
|
|||
|
|
- Permission filtering
|
|||
|
|
- Work-mode relevance
|
|||
|
|
- Eval harness
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
**No skill system.** RA.Aid has custom tools (`--custom-tools`) but no structured skill framework.
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Skill Definition** | YAML frontmatter | Python module |
|
|||
|
|
| **Discovery** | Hierarchical `.agents/skills/` | `--custom-tools` flag |
|
|||
|
|
| **Permissions** | Per-skill profile | None |
|
|||
|
|
| **Eval** | Built-in harness | None |
|
|||
|
|
| **Auto-creation** | Pattern detection | None |
|
|||
|
|
|
|||
|
|
**Verdict**: SF has a significant advantage for structured skill management.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Recovery & Resilience
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
| Mechanism | Purpose |
|
|||
|
|
|-----------|---------|
|
|||
|
|
| **Crash recovery** | Resume from checkpoint after failure |
|
|||
|
|
| **Verification retry** | Re-run failed verification gates |
|
|||
|
|
| **Rethink** | Inject rethink prompt on stuck detection |
|
|||
|
|
| **Circuit breaker** | Exponential backoff on gate failures |
|
|||
|
|
| **Cost guard** | Block expensive operations |
|
|||
|
|
| **Writer tokens** | Prevent concurrent writes |
|
|||
|
|
| **Parity system** | Detect and recover from drift |
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
| Mechanism | Purpose |
|
|||
|
|
|-----------|---------|
|
|||
|
|
| **Fallback handler** | Switch to alternative models on failure |
|
|||
|
|
| **Retry with backoff** | Re-run failed agent invocations |
|
|||
|
|
| **Token limiter** | Remove old messages to prevent overflow |
|
|||
|
|
| **Recursion limit** | Prevent infinite loops |
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Checkpoint/Resume** | Yes | No |
|
|||
|
|
| **Model Fallback** | Yes (on 429/rate-limit) | Yes |
|
|||
|
|
| **Token Management** | No | Yes (limiter) |
|
|||
|
|
| **Circuit Breaker** | Yes | No |
|
|||
|
|
| **Cost Guard** | Yes | No (budget only) |
|
|||
|
|
| **Concurrent Write Prevention** | Yes (writer tokens) | No |
|
|||
|
|
|
|||
|
|
**Verdict**: Different strengths. SF better for operational resilience; RA.Aid better for model resilience.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 13. MCP Integration
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
- **MCP Client**: Full MCP client with tool discovery, resource listing, OAuth
|
|||
|
|
- **MCP Server Guard**: Explicitly forbidden (test enforces this)
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// No SF MCP server — client only
|
|||
|
|
pi.registerMcpClient("filesystem", { ... });
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
**No MCP integration.** RA.Aid uses LangChain tools directly.
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **MCP Client** | Yes | No |
|
|||
|
|
| **MCP Server** | Explicitly forbidden | N/A |
|
|||
|
|
| **Tool Discovery** | Dynamic from MCP servers | Static tool definitions |
|
|||
|
|
|
|||
|
|
**Verdict**: SF is ahead for MCP ecosystem integration.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 14. Provider Abstraction
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
```javascript
|
|||
|
|
// pi-ai package
|
|||
|
|
const provider = await resolveProvider("anthropic", "claude-sonnet-4");
|
|||
|
|
const response = await provider.complete(prompt, { thinking: true });
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- Abstract provider interface
|
|||
|
|
- Model mode routing (fast/smart/deep)
|
|||
|
|
- Temperature/thinking level management
|
|||
|
|
- Provider allowlists/blocklists
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# llm.py
|
|||
|
|
model = initialize_llm(provider, model, temperature=temperature)
|
|||
|
|
response = model.invoke(prompt)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- LiteLLM for provider abstraction
|
|||
|
|
- Per-task provider/model override
|
|||
|
|
- Temperature support
|
|||
|
|
- Expert model consultation
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Abstraction Layer** | Custom (pi-ai) | LiteLLM |
|
|||
|
|
| **Model Routing** | Mode-based (fast/smart/deep) | Explicit flags |
|
|||
|
|
| **Expert Model** | No | Yes (reasoning_assist) |
|
|||
|
|
| **Temperature** | Yes | Yes |
|
|||
|
|
| **Thinking Level** | Yes | No |
|
|||
|
|
|
|||
|
|
**Verdict**: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 15. Documentation & Prompt Engineering
|
|||
|
|
|
|||
|
|
### SF
|
|||
|
|
|
|||
|
|
- **AGENTS.md**: Project-specific instructions
|
|||
|
|
- **CLAUDE.md**: Claude-specific guidance
|
|||
|
|
- **PDD**: Purpose-Driven Development fields
|
|||
|
|
- **Skills**: `.agents/skills/` with structured prompts
|
|||
|
|
- **Prompt History**: Per-project JSONL
|
|||
|
|
|
|||
|
|
### RA.Aid
|
|||
|
|
|
|||
|
|
- **Prompt Templates**: Separate files per agent
|
|||
|
|
- **Expert Prompts**: Optional expert consultation
|
|||
|
|
- **Human Prompts**: HIL sections
|
|||
|
|
- **Custom Tools**: Dynamic tool injection
|
|||
|
|
|
|||
|
|
### Comparison
|
|||
|
|
|
|||
|
|
| Aspect | SF | RA.Aid |
|
|||
|
|
|--------|-----|--------|
|
|||
|
|
| **Prompt Organization** | Skills + PDD | Agent-specific files |
|
|||
|
|
| **Expert Consultation** | Model mode routing | Explicit reasoning_assist |
|
|||
|
|
| **Human-in-the-loop** | Permission profiles | --hil flag |
|
|||
|
|
| **Custom Tools** | Skill system | --custom-tools flag |
|
|||
|
|
| **Prompt Versioning** | Git-tracked skills | Package-bundled |
|
|||
|
|
|
|||
|
|
**Verdict**: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Overall Assessment
|
|||
|
|
|
|||
|
|
### SF Strengths
|
|||
|
|
1. **Mode system**: 5 axes of control vs RA.Aid's binary flags
|
|||
|
|
2. **Subagent system**: Full delegation with inheritance
|
|||
|
|
3. **Skills system**: Structured, evaluable, discoverable
|
|||
|
|
4. **MCP integration**: Client-only, ecosystem-ready
|
|||
|
|
5. **Execution policy**: Granular permission profiles
|
|||
|
|
6. **Observability**: Prometheus-compatible metrics
|
|||
|
|
7. **Multi-surface**: TUI + web + headless + RPC
|
|||
|
|
|
|||
|
|
### RA.Aid Strengths
|
|||
|
|
1. **Explicit pipeline**: Clear research → plan → implement flow
|
|||
|
|
2. **Expert consultation**: Dynamic reasoning assistance
|
|||
|
|
3. **Cost tracking**: Built-in aggregation and CLI queries
|
|||
|
|
4. **Repository pattern**: Clean data access
|
|||
|
|
5. ~~Fallback handling~~: SF already has model switching on 429/rate-limit
|
|||
|
|
6. **Token limiting**: Prevent context overflow
|
|||
|
|
7. **Simplicity**: Easier to understand and modify
|
|||
|
|
|
|||
|
|
### Where SF Should Borrow from RA.Aid
|
|||
|
|
|
|||
|
|
1. **Explicit stage boundaries**: Add `/research`, `/plan`, `/implement` commands that mirror RA.Aid's agent pipeline
|
|||
|
|
2. **Expert consultation**: Add optional "expert model" for reasoning assistance before complex operations
|
|||
|
|
3. **Cost CLI**: Add `sf cost --session`, `sf cost --all` commands
|
|||
|
|
4. **Repository pattern**: Formalize data access with repository classes
|
|||
|
|
5. **Token limiting**: Add context window management
|
|||
|
|
6. ~~Fallback handler~~: SF already has model fallback on 429/rate-limit errors
|
|||
|
|
|
|||
|
|
### Where RA.Aid Should Borrow from SF
|
|||
|
|
|
|||
|
|
1. **Mode system**: Add work modes, permission profiles, model modes
|
|||
|
|
2. **Subagent system**: Add delegation for parallel work
|
|||
|
|
3. **Execution policy**: Replace cowboy_mode with granular profiles
|
|||
|
|
4. **Skills system**: Add structured skill framework
|
|||
|
|
5. **MCP integration**: Add MCP client support
|
|||
|
|
6. **UOK gates**: Add safety checkpoints between stages
|
|||
|
|
7. **Observability**: Add Prometheus metrics
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
SF and RA.Aid are complementary rather than competitive:
|
|||
|
|
|
|||
|
|
- **SF** is a **platform**: modular, multi-surface, safety-first, designed for complex multi-agent workflows
|
|||
|
|
- **RA.Aid** is a **tool**: focused, simple, explicit, designed for single-agent coding tasks
|
|||
|
|
|
|||
|
|
The ideal system would combine:
|
|||
|
|
- SF's mode system + subagent system + skills system
|
|||
|
|
- RA.Aid's explicit pipeline + expert consultation + cost tracking
|
|||
|
|
- Both projects' DB-first state philosophy
|