singularity-forge/docs/records/2026-05-07-cli-agent-code-survey.md

39 KiB

SF + ACE Full-Stack Reference Survey — 2026-05-07

This record compares local coding-agent, orchestration, retrieval, model, and platform-engineering references under /home/mhugo/code/ plus selected indexed public references against the intended SF+ACE full-stack flow. It is planning evidence, not an instruction to copy another product's architecture.

Product Boundary

Forge remains the local product/runtime surface, ACE remains the higher-level workflow/control-plane layer, and UOK remains the internal execution safety kernel. External systems are reference implementations used to sharpen the unified SF+ACE flow, not destination architectures.

Hard boundary: Forge must stay an MCP client only. Do not add, restore, or plan an SF MCP server. External control belongs in daemon, RPC, and headless interfaces.

Local Checkouts Inspected

Primary references:

  • singularity-forge
  • codex
  • claude-code
  • ace-coder

Additional coder references:

  • aider
  • Agentless
  • RA.Aid
  • plandex
  • goose
  • gemini-cli
  • qwen-code
  • opencode
  • crush
  • amazon-q-developer-cli
  • open-codex
  • letta-code
  • neovate-code
  • symphony
  • singularity/machine (codemachine)

Spec-system references:

  • spec-kit
  • OpenSpec
  • spec-kitty
  • cc-sdd

Indexed-only references to include in future passes:

  • kimi-cli / Kimi Code
  • upstream CodeMachine CLI (moazbuilds/CodeMachine-CLI)

Claude Code references in this survey are limited to public documentation and observed product behavior. Private or unverified mirrors are out of scope.

SF + ACE Full-Stack Reference Map

The long-term target is a unified SF+ACE autonomous software flow, not a collection of unrelated coding assistants. Compare each repo at the layer where it is strongest.

Repo / Tool Full-Stack Layer Pattern To Study Evidence Mode Safe sift Scope
singularity-forge Local product/runtime UOK, DB-first state, CLI/TUI/headless, extension tools, MCP-client-only guardrails local source + sift docs/, src/resources/extensions/sf/, packages/*/src/, tests
ace-coder Workflow/control plane HTDAG/YAML workflow DAGs, reviewers, quality gates, deployment governance, multi-repo memory local source + sift only AGENTS.md, CLAUDE.md, docs/, .agents/skills/, python/ai_dev/ first-party modules
symphony Work orchestration Linear polling, isolated per-issue workspaces, WORKFLOW.md, Codex app-server, retries, PR review/landing local source + Context7 /openai/symphony README.md, SPEC.md, elixir/WORKFLOW.md, elixir/AGENTS.md, .codex/skills/
codemachine Multi-agent workflow engine Engine matrix, SmartRouter, spec-to-code workflow templates, feature flags, tool health local fork/source + web upstream README.md, docs/architecture/, templates/workflows/, prompts/agents/, prompts/moderator/
Amplication Platform/golden paths Live templates, service catalog, plugin codegen, generated service lifecycle, compliance/drift web/GitHub; clone before local planning docs/, packages/*/src/, plugin/codegen packages if cloned
spec-kit Spec-driven artifacts Constitution, scenarios, FR/SC IDs, spec -> plan -> tasks -> analyze -> implement, generated checklists local source + Context7 /github/spec-kit templates/commands/, templates/*-template.md, scripts/, src/specify_cli/
OpenSpec Change/spec artifact graph Current specs vs proposed changes, artifact dependency chain, workspace links, JSON/no-interactive command behavior local source docs/, openspec/specs/, openspec/changes/, CLI source
spec-kitty Spec runtime governance machine/project boundary, manifest-last generated artifacts, package-bundled template source, command-owned action context local source architecture/adrs/, src/, .kittify/, kitty-specs/, docs/
cc-sdd Kiro-style phase-gated SDD steering vs specs, discovery routing, approvals, boundary-first tasks, per-task implement/review/debug roles local source AGENTS.md, CLAUDE.md, docs/guides/, tools/cc-sdd/src/
plandex Large-task implementation Cumulative diff sandbox, plan versioning, context loading, apply/debug loop local source + Context7 README.md, app/cli/lib/, app/server/db/, first-party docs
aider Edit loop/context map Repo-map ranking, edit formats, lint/test repair, benchmark metadata local source + Context7 aider/, benchmark/, tests/, docs; avoid generated website data unless needed
Agentless Bug repair/evals Localization -> repair -> patch validation, reproduction tests, reranking local source agentless/fl/, agentless/repair/, agentless/test/, benchmark docs
SWE-agent/OpenHands Bug repair/runtime research issue-to-patch loops, sandbox/runtime harnesses, SWE-bench evaluation Context7/web or local clone if added source/docs/evals only when cloned
codex Execution substrate Sandbox profiles, approval policy, app-server protocol, typed events, AGENTS scope local source + Context7 /openai/codex docs/, codex-rs/protocol/src/, codex-rs/exec/src/, codex-rs/linux-sandbox/; avoid vendor/
Claude Code UX reference Permissions, commands, plugins, MCP client UX, subagent UX public docs + observed behavior public docs, command UX, transcript/output behavior
qwen-code Terminal workflow trusted folders, subagent fork design, terminal-capture tests, provider config local source + Context7 docs/, packages/*/src/, integration-tests/terminal-capture/
Kimi Code Model-specific coding agent long-context coding, Kimi CLI/IDE flow, model-plan comparison Context7 /moonshotai/kimi-cli docs/source if cloned
CodeGeeX2 Model capability multilingual code model, HumanEval-X/DS1000, local deployment/quantization web/GitHub benchmark/evaluation/docs if cloned
gemini-cli Provider CLI/testing release channels, generated schemas/docs, eval promotion, perf/memory tests local source + Context7 if needed docs/, evals/, perf-tests/, memory-tests/, packages/*/src/
opencode Mode/schema boundary plan/build modes, client/server, project-local commands/tools, canonical schema local source + Context7 README.md, .opencode/, specs/, packages/opencode/specs/, packages/opencode/src/
crush Local runtime/TUI SQLite/sqlc, hooks, permissions, LSP, MCP client status, Bubble Tea UI local source internal/db/, internal/hooks/, internal/permission/, internal/agent/tools/, internal/ui/
goose Desktop/CLI/API agent diagnostics, API embedding, provider/extension breadth, MCP client lifecycle local source crates/, documentation/, ui/desktop/; do not copy server posture
letta-code Long-lived memory persistent agent memory, approval recovery, skills, channel/remote UX local source src/agent/, src/permissions/, src/cli/, src/tests/
OpenAgents Full-stack multi-agent platform backend/frontend/agent split, one-agent-one-folder, plugin/data/web agents, adapters web/GitHub; clone before local planning backend/, frontend/, real_agents/ if cloned
Claude Context / Context+ Code context retrieval vector-backed semantic code search, MCP-client integration, context cost reduction Context7/web code search/indexing packages if cloned
amazon-q-developer-cli Rust auth/security auth, security, workspace patterns, Rust CLI lessons local source; lower priority crates/chat-cli/, crates/agent/, docs

Comparison Matrix

Reference Strongest Fit For Forge Borrow Avoid
plandex Large task planning and review workflow Cumulative diff sandbox, plan versioning, explicit chat-to-plan-to-apply flow, context indexing for big repos Server/product coupling and any cloud-hosting assumptions
codex Execution hardening and protocol boundaries Typed non-interactive event stream, sandbox permission profiles, app-server protocol shape, Rust crate boundaries, config schema rigor, plugin/skill manager discipline Treating MCP server code as a Forge product direction
Claude Code Interactive ergonomics Permission UX, command discoverability, plugin surfaces, subagent UX, memory/context commands, MCP client config flows Copying private implementation details or making it an upstream dependency
ace-coder Owned multi-repo governance Reviewer roles, hard quality gates, skill/subagent routing policy, explicit MCP-client contract style Collapsing ACE and Forge into one product surface
aider Tight edit loop, repo maps, and benchmark culture Token-budgeted repo-map ranking, reproducible benchmark reports with model, edit format, commit hash, dirty state, pass rates, malformed output counts Early auto-commit posture before validation and commit gates
Agentless Bug-fix eval pipeline Localization, candidate repair, regression-test selection, reproduction-test generation, validation-based patch reranking SWE-bench-specific harness assumptions
RA.Aid Stage boundaries and trajectory records Explicit research/planning/implementation phases, research-only mode, durable session/tool trajectory records Broad autonomous shell posture and external Aider outsourcing
goose Desktop/CLI/API distribution and diagnostics Provider breadth, diagnostics/reporting, API embedding, extension packaging, MCP client lifecycle patterns Built-in/re-exported MCP servers or broad general-agent scope
gemini-cli Release/test/docs automation Release channels, generated settings schema/docs, behavioral eval incubation, sandbox integration tests, perf/memory baselines, GitHub Action workflows Provider-specific product assumptions or unstable evals as hard CI gates
qwen-code Claude-like terminal workflow and machine I/O Skills/subagents, forked subagent design, trust-gated workspace config, bidirectional stream-json, JSON fd/file side channels, terminal-capture regressions, flexible provider config OAuth/provider policy coupling, ungated project-local config, or mixing channel names with surface/protocol names
opencode Mode split and schema boundary Read-only plan mode vs full-access build mode, client/server framing, LSP opt-in, project-local commands/tools, schema-first domain boundary Bun-specific implementation style for Forge
crush SQLite state, hooks, permissions, TUI TUI as client over backend/session services, SQLite migrations/query discipline, hook engine, permission layering, session DB, tool markdown descriptions, LSP, pub/sub, MCP client status UX Replacing Forge's TypeScript extension architecture with Go or hiding machine protocol behind env vars
letta-code Long-lived memory-agent UX Memory lifecycle, skill learning, approval recovery tests, channel/remote control ideas, MCP OAuth/connect UX Treating memory as unstructured product magic instead of DB-backed state
neovate-code Design-doc and terminal UX iteration Small design records, queued-message designs, JSONL session replay, high-risk command classification, command/terminal UX records Quiet/headless auto-approval, global-only session memory, provider-specific branding, or immature UX churn
amazon-q-developer-cli Declarative agent and UI event reference Agent manifest schema, hooks/resources/tools, AG-UI-like lifecycle/text/tool/state event taxonomy, auth/security/workspace patterns Product direction, recursive delegate subprocesses, legacy raw passthrough as protocol, trust-all delegate defaults
open-codex Older/forked approval-mode comparison Approval-mode vocabulary and provider abstraction history Fork-specific Chat Completions direction as a primary architecture
symphony Work orchestration above individual agents Issue-tracker polling, per-issue isolated workspaces, repo-owned WORKFLOW.md, Codex app-server lifecycle, retries, operator state, CI/PR review and landing loops High-trust unattended defaults without Forge's UOK gates and DB-first runtime evidence
codemachine Multi-agent spec-to-code orchestration Engine matrix, SmartRouter routing, heterogeneous agents, spec-to-code templates, feature flags, tool health, local workflow examples, upstream repeatable long-running workflow model Optional MCP-server/tooling posture and Bun-specific implementation assumptions
Kimi Code Long-context model-specific coding agent Kimi CLI/IDE workflow, long-context coding, subagent-oriented terminal automation, model-plan comparison Treating provider-specific subscription/API behavior as a Forge architecture
spec-kit Spec-driven development workflow Constitution, prioritized user scenarios, acceptance criteria, functional requirements, measurable success criteria, spec -> plan -> tasks -> implement -> analyze loop Replacing Forge PDD/UOK with a generic spec template instead of mapping useful pieces into PDD fields
OpenSpec Brownfield change planning Clear split between current behavior specs and proposed changes, dependency-aware artifact continuation, workspace link/local mapping Treating docs/specs as Forge's operational source of truth instead of generated/reviewed exports from .sf/SQLite
spec-kitty Runtime and generated-artifact governance Global runtime vs project overlay, package-bundled template source, manifest-last promotion, command-owned context resolver Per-project full runtime copies, hidden alternate template sources, or prompt-level context discovery
cc-sdd Agentic SDD operating loop Steering/spec split, discovery router, approval gates, boundary annotations, per-task implementer/reviewer/debugger, implementation-note propagation Markdown-only operational state for SF, or requiring heavy gates for trivial direct fixes

Forge Already Has

  • DB-backed workflow state and project-local planning artifacts.
  • Headless/RPC surfaces for automation.
  • UOK safety and recovery concepts.
  • Extension loading and bundled tool surfaces.
  • Purpose-first TDD and PDD field contracts.
  • Provider abstraction through pi-ai.

Those are the center of gravity. Borrowed patterns should strengthen these surfaces instead of adding parallel state systems.

Gaps Worth Pulling Into The Roadmap

  1. Execution and permission hardening

    • Use Codex and Crush as the references.
    • Target Forge surfaces: exec-sandbox, production mutation approval, command permissions, headless/RPC mutation gates, DB-recorded tool-call evidence, and permission profiles that specify filesystem, network, .git, metadata, writable-root, and denied-path behavior.
  2. Plan/build mode separation

    • Use OpenCode, Plandex, and Qwen Code as the references.
    • Target Forge surfaces: explicit read-only planning mode, full-access build mode, and clearer mode transitions in auto/headless.
  3. Typed headless event stream

    • Use Codex, Gemini CLI, Qwen Code, Amazon Q, and OpenCode as the references.
    • Target Forge surfaces: stable machine-readable events such as thread.started, turn.started, turn.completed, turn.failed, item.started, item.updated, and item.completed, with typed payloads for commands, patches, MCP calls, web/context lookups, todos, and UOK evidence.
    • Qwen-style bidirectional machine contracts are especially relevant: stream-json input, stream-json output, JSON fd/file side channels, and long-lived session control.
  4. Reviewable cumulative diffs

    • Use Plandex and Aider as the references.
    • Target Forge surfaces: cumulative patch review, apply/reject/revise workflow, conflict analysis before apply/rewind, and commit metadata tied to model, prompt, dirty state, and evidence.
  5. Eval and bug-fix pipeline

    • Use Aider and Agentless as the references.
    • Target Forge surfaces: reproducible eval reports, localization -> repair -> validation cases, candidate patch sampling, reproduction-test generation, and validation-based failure reranking.
  6. Memory lifecycle and recovery

    • Use Letta Code and ACE as references, while keeping Forge DB-first.
    • Target Forge surfaces: durable memory extraction, turn recovery policy, approval recovery, stale-state reconciliation, typed memory records, and per-tool trajectory records for auto-mode postmortems.
  7. Terminal UX and command discoverability

    • Use Claude Code, Crush, OpenCode, and Neovate as references.
    • Target Forge surfaces: command catalog, permission prompts, status line, queued-message behavior, and compact TUI/headless diagnostics.
  8. Config and schema generation

    • Use Gemini CLI, Codex, Qwen Code, and Crush as references.
    • Target Forge surfaces: typed settings, generated docs, environment schema, DB migrations, and strict versioned JSON projections when JSON is only a compatibility/export format.
  9. MCP client lifecycle

    • Use Crush, Amazon Q, Claude Code, Letta Code, and Neovate as references.
    • Target Forge surfaces: explicit client states (disabled, starting, connected, error), reconnect behavior, scoped project/global/managed config, atomic config writes, tool namespacing such as mcp__server__tool, schema cleanup, resource list/read commands, OAuth connect UX, status counts, and evidence logging.
    • Stop rule: do not implement any SF MCP server, MCP worker backend, or bundled/re-exported MCP server.
  10. Work orchestration above single agent sessions

    • Use OpenAI Symphony and CodeMachine as references.
    • Target Forge surfaces: durable queue/roadmap dispatch, isolated working directories, issue/task lifecycle state, retry/backoff, per-run observability, proof-of-work handoff, and CI/PR review/landing loops.
    • Stop rule: orchestration must feed UOK and DB-backed state instead of bypassing Forge's safety gates.
  11. Spec-driven artifact pipeline

    • Use Spec Kit, OpenSpec, spec-kitty, cc-sdd, and CodeMachine as references.
    • Target Forge surfaces: convert intent into PDD fields, prioritized slices, acceptance criteria, functional requirements, measurable success criteria, task generation, and consistency analysis before implementation.
  12. Generated human exports and drift checks

    • Use spec-kitty and Spec Kit as references, but keep Forge database-first.
    • Target Forge surfaces: generated docs/specs/ exports, check commands that fail on stale projections, manifest-backed generated artifacts where promotion needs auditability, and command-owned context resolution rather than prompt heuristics.
    • Stop rule: generated docs may be reviewed and tracked by Git, but SF-owned operational history and future-use knowledge stay in .sf/SQLite.
  13. Batch input and project-state relocation

    • Use Aider and RA.Aid as references.
    • Target Forge surfaces: prompt-file batch input, dry-run previews, explicit --yes/confirmation gates, and an overrideable project-state directory for CI/sandboxes or migrated workspaces.
    • Stop rule: history/prompt buffers are convenience, not cross-repo memory or operational authority.
  14. Decomposed autonomy and high-risk command classification

    • Use Plandex, Goose, Gemini CLI, Qwen Code, and Neovate Code as references.
    • Target Forge surfaces: separate choices for context loading, apply, execution, commit behavior, run control, and permission profile; keep static high-risk command classification as a first-pass guard; fail closed when a non-interactive run reaches approval-required tool use without an explicit permission profile that allows it.
    • Stop rule: no broad always-yes mode and no destructive Git cleanup as a default recovery path.
  15. Declarative agent/run manifests

    • Use Amazon Q and Goose recipes as references.
    • Target Forge surfaces: reviewed agent/run manifests with prompt context, file resources, hooks, visible/allowed tools, model policy, extension requirements, parameters, and expected response/evidence contracts.
    • Stop rule: manifests feed UOK and .sf/sf.db; they do not bypass SF permission or evidence gates.

Priority Order

P0:

  • Keep Forge MCP-client-only; reject any MCP-server plan.
  • Harden command/tool execution policy and mutation gates.
  • Add typed headless event DTOs for auto/headless consumers.
  • Make DB-backed state the structured source of truth for planner/runtime records, with JSON/Markdown only as projections, imports, exports, or promoted human docs.
  • Add trust gating for project-local config, hooks, tools, .env, and automatic memory loading before expanding those surfaces.

P1:

  • Add explicit plan/build mode semantics.
  • Add cumulative diff review and evidence metadata.
  • Expand UOK evals with Agentless-style localization/repair/validation cases.
  • Add MCP client state/status/config hardening without adding any MCP server.
  • Add durable orchestration contracts for issue/task queues, isolated workspaces, retry policy, proof-of-work, and review/landing loops.

P2:

  • Improve terminal command discovery and permission UX.
  • Generate settings/environment docs from typed schemas.
  • Compare memory lifecycle/recovery against Letta and ACE.
  • Map Spec Kit scenario/requirement/success-criteria templates into Forge PDD fields without replacing PDD.

Evidence Pointers

The follow-up subagent pass inspected these concrete local paths:

  • aider/aider/repomap.py, aider/aider/coders/base_coder.py, aider/aider/linter.py, aider/aider/args.py, aider/aider/io.py, and aider/benchmark/README.md.
  • Agentless/agentless/fl/localize.py, Agentless/agentless/repair/rerank.py, Agentless/agentless/repair/repair.py, Agentless/agentless/test/generate_reproduction_tests.py, and Agentless/agentless/test/run_tests.py.
  • RA.Aid/ra_aid/agents/, RA.Aid/ra_aid/tools/programmer.py, RA.Aid/ra_aid/database/models.py, RA.Aid/ra_aid/config.py, RA.Aid/ra_aid/database/connection.py, and RA.Aid/ra_aid/tools/shell.py.
  • plandex/app/cli/lib/apply.go, plandex/app/cli/lib/rewind.go, plandex/app/cli/lib/git.go, plandex/app/cli/lib/repl.go, plandex/app/cli/cmd/plan_exec_helpers.go, plandex/app/cli/cmd/plan_start_helpers.go, plandex/app/server/db/diff_helpers.go, and plandex/app/server/db/plan_config_helpers.go.
  • codex/codex-rs/exec/src/exec_events.rs, codex/codex-rs/linux-sandbox/README.md, codex/codex-rs/linux-sandbox/src/linux_run_main.rs.
  • gemini-cli/evals/README.md, gemini-cli/perf-tests/README.md, gemini-cli/memory-tests/, gemini-cli/packages/cli/src/config/config.ts, gemini-cli/packages/cli/src/nonInteractiveCli.ts, gemini-cli/packages/core/src/output/types.ts, gemini-cli/packages/core/src/policy/types.ts, and gemini-cli/packages/core/src/config/storage.ts.
  • qwen-code/docs/users/configuration/trusted-folders.md, qwen-code/docs/design/fork-subagent/fork-subagent-design.md, qwen-code/integration-tests/terminal-capture/, qwen-code/packages/cli/src/config/config.ts, qwen-code/packages/cli/src/nonInteractiveCli.ts, qwen-code/packages/cli/src/nonInteractive/session.ts, qwen-code/packages/channels/base/README.md, and qwen-code/packages/core/src/permissions/types.ts.
  • opencode/.opencode/, opencode/specs/v2/session.md, opencode/packages/opencode/specs/effect/schema.md, opencode/packages/opencode/src/session/schema.ts.
  • crush/internal/db/, crush/internal/hooks/, crush/internal/permission/, crush/internal/agent/tools/mcp/.
  • Claude Code public documentation and observed command/transcript behavior.
  • letta-code/src/cli/components/McpConnectFlow.tsx, letta-code/src/cli/helpers/mcpOauth.ts, letta-code/src/agent/approval-recovery.ts.
  • neovate-code/src/mcp.ts, neovate-code/src/commands/mcp.ts, neovate-code/src/slash-commands/builtin/mcp.tsx, neovate-code/src/session.ts, neovate-code/src/config.ts, neovate-code/src/tools/bash.ts, and neovate-code/src/ui/ApprovalModal.tsx.
  • amazon-q-developer-cli/crates/agent/src/agent/mcp/, amazon-q-developer-cli/crates/chat-cli/src/cli/mcp.rs, amazon-q-developer-cli/schemas/agent-v1.json, amazon-q-developer-cli/crates/chat-cli-ui/src/protocol.rs, and amazon-q-developer-cli/crates/chat-cli/src/cli/chat/tools/delegate.rs.
  • goose/crates/goose/src/config/goose_mode.rs, goose/crates/goose/src/config/permission.rs, goose/crates/goose-cli/src/session/mod.rs, goose/crates/goose-cli/src/cli.rs, goose/crates/goose/src/session/session_manager.rs, and goose/crates/goose/src/recipe/mod.rs.
  • crush/internal/cmd/root.go, crush/internal/cmd/run.go, crush/internal/proto/proto.go, crush/internal/backend/permission.go, crush/internal/db/migrations/20250424200609_initial.sql, crush/internal/cmd/session.go, and crush/internal/config/config.go.
  • ace-coder/docs/MCP_SERVER.md, ace-coder/docs/plans/2026-04-05-mcp-daemon-refactor.md, ace-coder/python/ai_dev/mcp/.
  • symphony/README.md, symphony/SPEC.md, symphony/elixir/WORKFLOW.md, symphony/elixir/AGENTS.md, and .codex/skills/land/SKILL.md.
  • singularity/machine/README.md, package.json, templates/workflows/, docs/architecture/engine-matrix.md, and docs/OPENAI_SPECS_DOWNLOAD.md.
  • spec-kit/README.md, templates/commands/specify.md, templates/commands/plan.md, templates/commands/tasks.md, and scripts/bash/common.sh.
  • OpenSpec/docs/concepts.md, OpenSpec/docs/commands.md, OpenSpec/openspec/specs/, and OpenSpec/openspec/changes/.
  • spec-kitty/architecture/adrs/2026-04-08-1-global-kittify-machine-level-runtime.md, spec-kitty/architecture/adrs/2026-04-08-2-package-bundled-templates-sole-source.md, and spec-kitty/architecture/adrs/2026-03-09-1-prompts-do-not-discover-context-commands-do.md.
  • cc-sdd/AGENTS.md, cc-sdd/README.md, cc-sdd/docs/guides/spec-driven.md, and cc-sdd/docs/guides/skill-reference.md.

Context7 Cross-Check

Context7 was used after the local-source pass as a secondary check for indexed public references. Local source remains the evidence of record because it is the snapshot available on this machine.

  • /openai/codex confirmed the relevant Codex public patterns: interactive and non-interactive CLI modes, app-server, AGENTS.md project guidance, approval policy, and sandbox modes (read-only, workspace-write, danger-full-access) with writable roots and network controls.
  • /plandex-ai/plandex confirmed the relevant Plandex public patterns: semi/full run-control levels, smart context loading, cumulative diff review sandbox, review/apply/debug workflow, and large multi-file task focus.
  • /ai-christianson/ra.aid confirmed the relevant RA.Aid public patterns: research-only and research-and-plan-only modes, the research -> planning -> implementation workflow, logging/cost visibility, and the risky --cowboy-mode shell approval bypass that Forge should not copy.
  • Context7 also resolves these remaining comparison targets for later deeper checks:
    • Aider: /websites/aider_chat and /aider-ai/aider.
    • Qwen Code: /qwenlm/qwen-code, /websites/qwenlm_github_io_qwen-code-docs, and /websites/qwenlm_github_io_qwen-code-docs_en.
    • OpenCode: /anomalyco/opencode.
    • OpenAI Symphony: /openai/symphony.
    • Kimi Code: /moonshotai/kimi-cli, /websites/moonshotai_github_io_kimi-cli_en, and /websites/kimi_code.
  • Spec Kit: /github/spec-kit and /websites/github_github_io_spec-kit.
    • Upstream CodeMachine CLI did not resolve by name in Context7 during this pass, but GitHub confirms https://github.com/moazbuilds/CodeMachine-CLI as the public upstream-style repo for CodeMachine CLI. The local checkout inspected is https://github.com/singularity-ng/machine.git, so treat it as local fork/mirror evidence rather than exact upstream state.

Local Sift Cross-Check

ACE is private/local and should not be treated as Context7-indexed. Use sift for ACE and Forge when checking private or machine-local architecture.

For dependency hygiene, do not run broad sift search over repo roots that may contain vendored dependencies, package caches, build output, or generated blobs. This sift install does not expose an exclude flag, so scope searches to first-party paths such as docs/, src/, packages/*/src/, specs/, AGENTS.md, CLAUDE.md, and known design files. Avoid node_modules/, vendor/, dist/, build/, target/, .venv/, caches, fixture dumps, and generated lock/schema/output directories unless the dependency surface itself is the subject of the question.

The targeted sift pass found:

  • Codex codex-rs/protocol/src/config_types.rs and protocol.rs: confirms first-party typed approval policy and sandbox mode surfaces without searching codex-rs/vendor/.
  • OpenCode packages/opencode/specs/effect/schema.md: confirms the schema-first rule to prefer one canonical schema definition and derive compatibility schemas instead of maintaining parallel sources of truth.
  • Aider first-party docs/tests: confirms local repo-map/edit-format/lint/test and commit behavior surfaces.
  • Plandex README.md, changelog, and first-party app model files: confirms the cumulative diff sandbox, controlled command execution, rollback/debug loop, and planning phases.
  • Qwen Code docs/: confirms terminal-capture integration tests, trusted folders documentation, and provider configuration docs.
  • RA.Aid first-party docs/source: confirms shell command approval bypass via --cowboy-mode, research/planning agents, and session/logging surfaces.
  • Symphony first-party spec/workflow files: confirm issue-tracker polling, per-issue workspace isolation, repo-owned WORKFLOW.md, Codex app-server lifecycle, max turns/concurrency, retry/backoff, state snapshots, token/rate observability, PR feedback sweeps, and land-loop skills.
  • CodeMachine first-party docs/templates: confirm local multi-agent orchestration, heterogeneous engine routing, spec-to-code workflow templates, feature-flag governance, health/status commands, and optional MCP tooling. GitHub upstream moazbuilds/CodeMachine-CLI confirms the public product framing: repeatable long-running workflows, multi-agent orchestration, parallel execution, context engineering, and headless scripting of coding engines such as Claude Code, Codex, Cursor, and others.
  • ACE AGENTS.md: confirms the repo-local Claude MCP client contract, hard stops, skills, reviewer workflow, quality gate, and the warning that ACE's autonomous system uses its own code/YAML workflow DAGs rather than AGENTS.md.
  • ACE docs/designs/SPEC_TO_BUILD_PIPELINE_DESIGN.md: confirms the spec -> feature graph -> ADR detection -> implementation planning/review pipeline.
  • Forge docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md: confirms the durable SF boundary: if external control is needed, use daemon/RPC/headless contracts; MCP remains SF-as-client only.
  • Forge src/resources/extensions/sf/tests/no-sf-mcp-server.test.mjs: confirms there is executable guard coverage preventing recreation of SF MCP server paths.

Local Surface/State Cross-Check

The detailed coder-agent pass supports Forge's five-axis operating model.

  • Codex has the cleanest split: TUI flags, non-interactive exec, app-server protocol, sandbox mode, approval policy, persistent state, and input history are separate code paths and types. Forge should copy the typed separation, not the exact crate structure.
  • OpenCode treats the TUI as one client of a server, exposes HTTP/OpenAPI, and bridges ACP as a protocol adapter over its server. Forge should adapt the client/server clarity and generated SDK idea without making HTTP the only mental model for machine access.
  • Claude Code is strong on structured headless output, rich JSONL transcripts, entrypoint metadata, permission modes, and sandbox schemas. Forge should adapt the stream discipline and transcript fields while avoiding hidden feature-flag sprawl.
  • old open-codex is the cautionary example: autonomy, approval policy, output, and sandbox behavior collapse into a small set of flags. Forge should keep run control, output format, protocol, and permission profile independent.

The second coder-agent pass adds state/history guardrails:

  • Aider's prompt-file, dry-run, and explicit yes primitives are useful batch affordances, but its flat history files should remain convenience only.
  • RA.Aid's overrideable project state directory is useful for CI/sandboxes, but broad shell-approval bypasses are not.
  • Plandex's decomposed context/apply/execution/commit modes map well to Forge's need to keep run control separate from permission profile and Git policy.
  • Agentless's stage JSONL artifacts are good eval/evidence inputs; import them into DB-backed contracts instead of making JSONL the live state model.
  • Neovate's JSONL session replay and high-risk bash classifier are useful; its quiet-mode auto-approval is not.

The terminal-agent pass adds concrete machine/API patterns:

  • Gemini and Qwen both expose text, json, and stream-json output formats. Qwen goes further with stream-json input plus JSON fd/file side channels. Forge should adopt the bidirectional contract shape for parent processes.
  • Goose cleanly names run-control and permission behavior, and its non-interactive path refuses approval-required tool calls unless the run was explicitly allowed to proceed. Forge should copy that fail-closed posture.
  • Crush shows the right architecture shape for TUI-over-backend/session-store while keeping session/message/file history in SQLite. Forge already wants this DB-first boundary; the lesson is to avoid making the machine protocol hidden or text-only.
  • Amazon Q has the richest declarative agent schema and useful lifecycle/text/ tool/state event taxonomy. Forge should adapt manifests and event taxonomy, not recursive delegate subprocesses or raw passthrough protocol events.
  • GitHub Copilot CLI's autopilot documentation is a useful naming cross-check: autopilot is the continuation behavior, --allow-all/--yolo are permission expansion, and --no-ask-user is question suppression. Forge should keep the same separation but use SF's own terms: run control is manual | assisted | autonomous, permission profile is restricted | normal | trusted | unrestricted, and headless/machine output is a surface/format concern.

Local Spec-System Cross-Check

The spec-system pass reinforces the current Forge direction: specs and docs are valuable contracts for humans, but command/runtime state must own execution.

  • Spec Kit creates specs/<feature>/spec.md, stores the active feature pointer in .specify/feature.json, runs plan/task setup scripts that return JSON, and generates quality checklists before planning. The useful pattern is explicit generated artifacts plus machine-readable path discovery, not a second SF planning database. Its analyze command is a useful read-only consistency shape for comparing spec, plan, tasks, and constitution; in Forge that should compare .sf/DB state, generated docs, evidence, and code diffs.
  • OpenSpec separates openspec/specs/ as current behavior from openspec/changes/ as proposed modifications, then uses commands like propose, continue, ff, verify, sync, and archive to move artifacts through a schema-defined dependency graph. Its best pattern is deterministic ready/blocked artifact queries, apply requirements, and delta validation. Forge should keep the change/spec distinction as a human review model while storing operational order/gates in .sf/sf.db.
  • spec-kitty is strongest on runtime boundaries: one global machine runtime, thin project overlay, package-bundled templates as the sole end-user source, manifest-last generated artifact promotion, and command-owned action context. Its newer runtime pattern is also important: append-only status events, materialized status, expected-artifact manifests with blocking semantics, step contracts, explicit write scopes, review evidence, and commit hooks that keep planning artifacts out of lane branches. Forge should copy the boundary discipline and avoid prompt-level discovery for any context a command can resolve.
  • cc-sdd is strongest on phase gates and role separation: steering vs feature specs, discovery routing, requirements/design/tasks approvals, boundary-first task contracts, and per-task implementer/reviewer/debugger contexts. Forge should adapt the contract discipline into PDD fields and UOK gates without making markdown checkboxes the operational state store. The most reusable pieces are boundary/dependency annotations, observable completion, approval booleans as explicit state, implementation-note propagation, and a manifest planner for generated agent/startup artifacts with dry-run/conflict policy.
  • Forge docs/records/2026-05-07-cli-agent-code-survey.md: now records the MCP-client-only product boundary and roadmap pull-through.

Implementation Follow-Up

The first DB-backed retrieval slice landed with schema v41:

  • retrieval_evidence records backend, source kind, query, strategy, scope, project root, git head/branch, worktree dirty flag, freshness, status, hit count, elapsed time, cache path, error, result metadata, and timestamp.
  • sift_search and codebase_search write retrieval evidence for successful and failed searches.
  • Native Context7 resolve_library and get_library_docs write docs retrieval evidence with freshness=external-index.
  • search-the-web writes web retrieval evidence with freshness=external-live for success, cache hits, missing-provider errors, duplicate-loop stops, budget exhaustion, aborts, and provider failures.
  • sf_retrieval_evidence exposes the rows through the SF read-only DB tool surface so agents do not query .sf/sf.db directly.
  • Sift telemetry now uses the no-op debug logger; telemetry failures no longer turn successful searches into failed tool calls.

Next slices should wrap search_and_read and fetch_page results in the same evidence contract before using them for planning.

The first execution-policy vocabulary slice also landed:

  • execution-policy.js defines named plan, build, trusted, and unrestricted profiles with filesystem, network, git, and mutation posture.
  • The plan profile reuses the existing queue-mode write gate, so read-only commands and .sf/ planning artifacts are allowed while source mutations are denied.
  • The build profile records destructive bash risk labels from the existing destructive-command classifier without changing runtime enforcement yet.
  • Auto-mode now writes execution-policy-decision journal events for tool calls, recording the profile, allow/deny result, risk, destructive labels, tool name, call id, and policy-relevant command/path only.

Next slices should project these profile decisions into UOK evidence and the machine-surface JSON/JSONL projections before broad enforcement.

Resulting Direction

Forge should absorb proven patterns into UOK and the existing DB-first runtime: structured state, explicit modes, stronger permissions, reproducible evidence, and better review UX. The goal is not feature parity with every coder. The goal is a purpose-to-software compiler whose run control and permission profile are inspectable, recoverable, and safe enough to run repeatedly.