Mikael Hugo 6fc054e7c3 sf snapshot: uncommitted changes after 49m inactivity

2026-05-08 01:07:24 +02:00

39 KiB

Raw Blame History

SF + ACE Full-Stack Reference Survey — 2026-05-07

This record compares local coding-agent, orchestration, retrieval, model, and platform-engineering references under /home/mhugo/code/ plus selected indexed public references against the intended SF+ACE full-stack flow. It is planning evidence, not an instruction to copy another product's architecture.

Product Boundary

Forge remains the local product/runtime surface, ACE remains the higher-level workflow/control-plane layer, and UOK remains the internal execution safety kernel. External systems are reference implementations used to sharpen the unified SF+ACE flow, not destination architectures.

Hard boundary: Forge must stay an MCP client only. Do not add, restore, or plan an SF MCP server. External control belongs in daemon, RPC, and headless interfaces.

Local Checkouts Inspected

Primary references:

singularity-forge
codex
claude-code
ace-coder

Additional coder references:

aider
Agentless
RA.Aid
plandex
goose
gemini-cli
qwen-code
opencode
crush
amazon-q-developer-cli
open-codex
letta-code
neovate-code
symphony
singularity/machine (codemachine)

Spec-system references:

spec-kit
OpenSpec
spec-kitty
cc-sdd

Indexed-only references to include in future passes:

kimi-cli / Kimi Code
upstream CodeMachine CLI (moazbuilds/CodeMachine-CLI)

Claude Code references in this survey are limited to public documentation and observed product behavior. Private or unverified mirrors are out of scope.

SF + ACE Full-Stack Reference Map

The long-term target is a unified SF+ACE autonomous software flow, not a collection of unrelated coding assistants. Compare each repo at the layer where it is strongest.

Repo / Tool	Full-Stack Layer	Pattern To Study	Evidence Mode	Safe `sift` Scope
`singularity-forge`	Local product/runtime	UOK, DB-first state, CLI/TUI/headless, extension tools, MCP-client-only guardrails	local source + `sift`	`docs/`, `src/resources/extensions/sf/`, `packages/*/src/`, tests
`ace-coder`	Workflow/control plane	HTDAG/YAML workflow DAGs, reviewers, quality gates, deployment governance, multi-repo memory	local source + `sift` only	`AGENTS.md`, `CLAUDE.md`, `docs/`, `.agents/skills/`, `python/ai_dev/` first-party modules
`symphony`	Work orchestration	Linear polling, isolated per-issue workspaces, `WORKFLOW.md`, Codex app-server, retries, PR review/landing	local source + Context7 `/openai/symphony`	`README.md`, `SPEC.md`, `elixir/WORKFLOW.md`, `elixir/AGENTS.md`, `.codex/skills/`
`codemachine`	Multi-agent workflow engine	Engine matrix, SmartRouter, spec-to-code workflow templates, feature flags, tool health	local fork/source + web upstream	`README.md`, `docs/architecture/`, `templates/workflows/`, `prompts/agents/`, `prompts/moderator/`
Amplication	Platform/golden paths	Live templates, service catalog, plugin codegen, generated service lifecycle, compliance/drift	web/GitHub; clone before local planning	`docs/`, `packages/*/src/`, plugin/codegen packages if cloned
`spec-kit`	Spec-driven artifacts	Constitution, scenarios, FR/SC IDs, spec -> plan -> tasks -> analyze -> implement, generated checklists	local source + Context7 `/github/spec-kit`	`templates/commands/`, `templates/*-template.md`, `scripts/`, `src/specify_cli/`
`OpenSpec`	Change/spec artifact graph	Current specs vs proposed changes, artifact dependency chain, workspace links, JSON/no-interactive command behavior	local source	`docs/`, `openspec/specs/`, `openspec/changes/`, CLI source
`spec-kitty`	Spec runtime governance	machine/project boundary, manifest-last generated artifacts, package-bundled template source, command-owned action context	local source	`architecture/adrs/`, `src/`, `.kittify/`, `kitty-specs/`, `docs/`
`cc-sdd`	Kiro-style phase-gated SDD	steering vs specs, discovery routing, approvals, boundary-first tasks, per-task implement/review/debug roles	local source	`AGENTS.md`, `CLAUDE.md`, `docs/guides/`, `tools/cc-sdd/src/`
`plandex`	Large-task implementation	Cumulative diff sandbox, plan versioning, context loading, apply/debug loop	local source + Context7	`README.md`, `app/cli/lib/`, `app/server/db/`, first-party docs
`aider`	Edit loop/context map	Repo-map ranking, edit formats, lint/test repair, benchmark metadata	local source + Context7	`aider/`, `benchmark/`, `tests/`, docs; avoid generated website data unless needed
`Agentless`	Bug repair/evals	Localization -> repair -> patch validation, reproduction tests, reranking	local source	`agentless/fl/`, `agentless/repair/`, `agentless/test/`, benchmark docs
SWE-agent/OpenHands	Bug repair/runtime research	issue-to-patch loops, sandbox/runtime harnesses, SWE-bench evaluation	Context7/web or local clone if added	source/docs/evals only when cloned
`codex`	Execution substrate	Sandbox profiles, approval policy, app-server protocol, typed events, AGENTS scope	local source + Context7 `/openai/codex`	`docs/`, `codex-rs/protocol/src/`, `codex-rs/exec/src/`, `codex-rs/linux-sandbox/`; avoid `vendor/`
Claude Code	UX reference	Permissions, commands, plugins, MCP client UX, subagent UX	public docs + observed behavior	public docs, command UX, transcript/output behavior
`qwen-code`	Terminal workflow	trusted folders, subagent fork design, terminal-capture tests, provider config	local source + Context7	`docs/`, `packages/*/src/`, `integration-tests/terminal-capture/`
Kimi Code	Model-specific coding agent	long-context coding, Kimi CLI/IDE flow, model-plan comparison	Context7 `/moonshotai/kimi-cli`	docs/source if cloned
CodeGeeX2	Model capability	multilingual code model, HumanEval-X/DS1000, local deployment/quantization	web/GitHub	benchmark/evaluation/docs if cloned
`gemini-cli`	Provider CLI/testing	release channels, generated schemas/docs, eval promotion, perf/memory tests	local source + Context7 if needed	`docs/`, `evals/`, `perf-tests/`, `memory-tests/`, `packages/*/src/`
`opencode`	Mode/schema boundary	plan/build modes, client/server, project-local commands/tools, canonical schema	local source + Context7	`README.md`, `.opencode/`, `specs/`, `packages/opencode/specs/`, `packages/opencode/src/`
`crush`	Local runtime/TUI	SQLite/sqlc, hooks, permissions, LSP, MCP client status, Bubble Tea UI	local source	`internal/db/`, `internal/hooks/`, `internal/permission/`, `internal/agent/tools/`, `internal/ui/`
`goose`	Desktop/CLI/API agent	diagnostics, API embedding, provider/extension breadth, MCP client lifecycle	local source	`crates/`, `documentation/`, `ui/desktop/`; do not copy server posture
`letta-code`	Long-lived memory	persistent agent memory, approval recovery, skills, channel/remote UX	local source	`src/agent/`, `src/permissions/`, `src/cli/`, `src/tests/`
`OpenAgents`	Full-stack multi-agent platform	backend/frontend/agent split, one-agent-one-folder, plugin/data/web agents, adapters	web/GitHub; clone before local planning	`backend/`, `frontend/`, `real_agents/` if cloned
Claude Context / Context+	Code context retrieval	vector-backed semantic code search, MCP-client integration, context cost reduction	Context7/web	code search/indexing packages if cloned
`amazon-q-developer-cli`	Rust auth/security	auth, security, workspace patterns, Rust CLI lessons	local source; lower priority	`crates/chat-cli/`, `crates/agent/`, docs

Comparison Matrix

Reference	Strongest Fit For Forge	Borrow	Avoid
`plandex`	Large task planning and review workflow	Cumulative diff sandbox, plan versioning, explicit chat-to-plan-to-apply flow, context indexing for big repos	Server/product coupling and any cloud-hosting assumptions
`codex`	Execution hardening and protocol boundaries	Typed non-interactive event stream, sandbox permission profiles, app-server protocol shape, Rust crate boundaries, config schema rigor, plugin/skill manager discipline	Treating MCP server code as a Forge product direction
Claude Code	Interactive ergonomics	Permission UX, command discoverability, plugin surfaces, subagent UX, memory/context commands, MCP client config flows	Copying private implementation details or making it an upstream dependency
`ace-coder`	Owned multi-repo governance	Reviewer roles, hard quality gates, skill/subagent routing policy, explicit MCP-client contract style	Collapsing ACE and Forge into one product surface
`aider`	Tight edit loop, repo maps, and benchmark culture	Token-budgeted repo-map ranking, reproducible benchmark reports with model, edit format, commit hash, dirty state, pass rates, malformed output counts	Early auto-commit posture before validation and commit gates
`Agentless`	Bug-fix eval pipeline	Localization, candidate repair, regression-test selection, reproduction-test generation, validation-based patch reranking	SWE-bench-specific harness assumptions
`RA.Aid`	Stage boundaries and trajectory records	Explicit research/planning/implementation phases, research-only mode, durable session/tool trajectory records	Broad autonomous shell posture and external Aider outsourcing
`goose`	Desktop/CLI/API distribution and diagnostics	Provider breadth, diagnostics/reporting, API embedding, extension packaging, MCP client lifecycle patterns	Built-in/re-exported MCP servers or broad general-agent scope
`gemini-cli`	Release/test/docs automation	Release channels, generated settings schema/docs, behavioral eval incubation, sandbox integration tests, perf/memory baselines, GitHub Action workflows	Provider-specific product assumptions or unstable evals as hard CI gates
`qwen-code`	Claude-like terminal workflow and machine I/O	Skills/subagents, forked subagent design, trust-gated workspace config, bidirectional stream-json, JSON fd/file side channels, terminal-capture regressions, flexible provider config	OAuth/provider policy coupling, ungated project-local config, or mixing channel names with surface/protocol names
`opencode`	Mode split and schema boundary	Read-only `plan` mode vs full-access `build` mode, client/server framing, LSP opt-in, project-local commands/tools, schema-first domain boundary	Bun-specific implementation style for Forge
`crush`	SQLite state, hooks, permissions, TUI	TUI as client over backend/session services, SQLite migrations/query discipline, hook engine, permission layering, session DB, tool markdown descriptions, LSP, pub/sub, MCP client status UX	Replacing Forge's TypeScript extension architecture with Go or hiding machine protocol behind env vars
`letta-code`	Long-lived memory-agent UX	Memory lifecycle, skill learning, approval recovery tests, channel/remote control ideas, MCP OAuth/connect UX	Treating memory as unstructured product magic instead of DB-backed state
`neovate-code`	Design-doc and terminal UX iteration	Small design records, queued-message designs, JSONL session replay, high-risk command classification, command/terminal UX records	Quiet/headless auto-approval, global-only session memory, provider-specific branding, or immature UX churn
`amazon-q-developer-cli`	Declarative agent and UI event reference	Agent manifest schema, hooks/resources/tools, AG-UI-like lifecycle/text/tool/state event taxonomy, auth/security/workspace patterns	Product direction, recursive delegate subprocesses, legacy raw passthrough as protocol, trust-all delegate defaults
`open-codex`	Older/forked approval-mode comparison	Approval-mode vocabulary and provider abstraction history	Fork-specific Chat Completions direction as a primary architecture
`symphony`	Work orchestration above individual agents	Issue-tracker polling, per-issue isolated workspaces, repo-owned `WORKFLOW.md`, Codex app-server lifecycle, retries, operator state, CI/PR review and landing loops	High-trust unattended defaults without Forge's UOK gates and DB-first runtime evidence
`codemachine`	Multi-agent spec-to-code orchestration	Engine matrix, SmartRouter routing, heterogeneous agents, spec-to-code templates, feature flags, tool health, local workflow examples, upstream repeatable long-running workflow model	Optional MCP-server/tooling posture and Bun-specific implementation assumptions
Kimi Code	Long-context model-specific coding agent	Kimi CLI/IDE workflow, long-context coding, subagent-oriented terminal automation, model-plan comparison	Treating provider-specific subscription/API behavior as a Forge architecture
`spec-kit`	Spec-driven development workflow	Constitution, prioritized user scenarios, acceptance criteria, functional requirements, measurable success criteria, spec -> plan -> tasks -> implement -> analyze loop	Replacing Forge PDD/UOK with a generic spec template instead of mapping useful pieces into PDD fields
`OpenSpec`	Brownfield change planning	Clear split between current behavior specs and proposed changes, dependency-aware artifact continuation, workspace link/local mapping	Treating docs/specs as Forge's operational source of truth instead of generated/reviewed exports from `.sf`/SQLite
`spec-kitty`	Runtime and generated-artifact governance	Global runtime vs project overlay, package-bundled template source, manifest-last promotion, command-owned context resolver	Per-project full runtime copies, hidden alternate template sources, or prompt-level context discovery
`cc-sdd`	Agentic SDD operating loop	Steering/spec split, discovery router, approval gates, boundary annotations, per-task implementer/reviewer/debugger, implementation-note propagation	Markdown-only operational state for SF, or requiring heavy gates for trivial direct fixes

Forge Already Has

DB-backed workflow state and project-local planning artifacts.
Headless/RPC surfaces for automation.
UOK safety and recovery concepts.
Extension loading and bundled tool surfaces.
Purpose-first TDD and PDD field contracts.
Provider abstraction through pi-ai.

Those are the center of gravity. Borrowed patterns should strengthen these surfaces instead of adding parallel state systems.

Gaps Worth Pulling Into The Roadmap

Execution and permission hardening
- Use Codex and Crush as the references.
- Target Forge surfaces: exec-sandbox, production mutation approval, command permissions, headless/RPC mutation gates, DB-recorded tool-call evidence, and permission profiles that specify filesystem, network, .git, metadata, writable-root, and denied-path behavior.
Plan/build mode separation
- Use OpenCode, Plandex, and Qwen Code as the references.
- Target Forge surfaces: explicit read-only planning mode, full-access build mode, and clearer mode transitions in auto/headless.
Typed headless event stream
- Use Codex, Gemini CLI, Qwen Code, Amazon Q, and OpenCode as the references.
- Target Forge surfaces: stable machine-readable events such as thread.started, turn.started, turn.completed, turn.failed, item.started, item.updated, and item.completed, with typed payloads for commands, patches, MCP calls, web/context lookups, todos, and UOK evidence.
- Qwen-style bidirectional machine contracts are especially relevant: stream-json input, stream-json output, JSON fd/file side channels, and long-lived session control.
Reviewable cumulative diffs
- Use Plandex and Aider as the references.
- Target Forge surfaces: cumulative patch review, apply/reject/revise workflow, conflict analysis before apply/rewind, and commit metadata tied to model, prompt, dirty state, and evidence.
Eval and bug-fix pipeline
- Use Aider and Agentless as the references.
- Target Forge surfaces: reproducible eval reports, localization -> repair -> validation cases, candidate patch sampling, reproduction-test generation, and validation-based failure reranking.
Memory lifecycle and recovery
- Use Letta Code and ACE as references, while keeping Forge DB-first.
- Target Forge surfaces: durable memory extraction, turn recovery policy, approval recovery, stale-state reconciliation, typed memory records, and per-tool trajectory records for auto-mode postmortems.
Terminal UX and command discoverability
- Use Claude Code, Crush, OpenCode, and Neovate as references.
- Target Forge surfaces: command catalog, permission prompts, status line, queued-message behavior, and compact TUI/headless diagnostics.
Config and schema generation
- Use Gemini CLI, Codex, Qwen Code, and Crush as references.
- Target Forge surfaces: typed settings, generated docs, environment schema, DB migrations, and strict versioned JSON projections when JSON is only a compatibility/export format.
MCP client lifecycle
- Use Crush, Amazon Q, Claude Code, Letta Code, and Neovate as references.
- Target Forge surfaces: explicit client states (disabled, starting, connected, error), reconnect behavior, scoped project/global/managed config, atomic config writes, tool namespacing such as mcp__server__tool, schema cleanup, resource list/read commands, OAuth connect UX, status counts, and evidence logging.
- Stop rule: do not implement any SF MCP server, MCP worker backend, or bundled/re-exported MCP server.
Work orchestration above single agent sessions
- Use OpenAI Symphony and CodeMachine as references.
- Target Forge surfaces: durable queue/roadmap dispatch, isolated working directories, issue/task lifecycle state, retry/backoff, per-run observability, proof-of-work handoff, and CI/PR review/landing loops.
- Stop rule: orchestration must feed UOK and DB-backed state instead of bypassing Forge's safety gates.
Spec-driven artifact pipeline
- Use Spec Kit, OpenSpec, spec-kitty, cc-sdd, and CodeMachine as references.
- Target Forge surfaces: convert intent into PDD fields, prioritized slices, acceptance criteria, functional requirements, measurable success criteria, task generation, and consistency analysis before implementation.
Generated human exports and drift checks
- Use spec-kitty and Spec Kit as references, but keep Forge database-first.
- Target Forge surfaces: generated docs/specs/ exports, check commands that fail on stale projections, manifest-backed generated artifacts where promotion needs auditability, and command-owned context resolution rather than prompt heuristics.
- Stop rule: generated docs may be reviewed and tracked by Git, but SF-owned operational history and future-use knowledge stay in .sf/SQLite.
Batch input and project-state relocation
- Use Aider and RA.Aid as references.
- Target Forge surfaces: prompt-file batch input, dry-run previews, explicit --yes/confirmation gates, and an overrideable project-state directory for CI/sandboxes or migrated workspaces.
- Stop rule: history/prompt buffers are convenience, not cross-repo memory or operational authority.
Decomposed autonomy and high-risk command classification
- Use Plandex, Goose, Gemini CLI, Qwen Code, and Neovate Code as references.
- Target Forge surfaces: separate choices for context loading, apply, execution, commit behavior, run control, and permission profile; keep static high-risk command classification as a first-pass guard; fail closed when a non-interactive run reaches approval-required tool use without an explicit permission profile that allows it.
- Stop rule: no broad always-yes mode and no destructive Git cleanup as a default recovery path.
Declarative agent/run manifests
- Use Amazon Q and Goose recipes as references.
- Target Forge surfaces: reviewed agent/run manifests with prompt context, file resources, hooks, visible/allowed tools, model policy, extension requirements, parameters, and expected response/evidence contracts.
- Stop rule: manifests feed UOK and .sf/sf.db; they do not bypass SF permission or evidence gates.

Priority Order

P0:

Keep Forge MCP-client-only; reject any MCP-server plan.
Harden command/tool execution policy and mutation gates.
Add typed headless event DTOs for auto/headless consumers.
Make DB-backed state the structured source of truth for planner/runtime records, with JSON/Markdown only as projections, imports, exports, or promoted human docs.
Add trust gating for project-local config, hooks, tools, .env, and automatic memory loading before expanding those surfaces.

P1:

Add explicit plan/build mode semantics.
Add cumulative diff review and evidence metadata.
Expand UOK evals with Agentless-style localization/repair/validation cases.
Add MCP client state/status/config hardening without adding any MCP server.
Add durable orchestration contracts for issue/task queues, isolated workspaces, retry policy, proof-of-work, and review/landing loops.

P2:

Improve terminal command discovery and permission UX.
Generate settings/environment docs from typed schemas.
Compare memory lifecycle/recovery against Letta and ACE.
Map Spec Kit scenario/requirement/success-criteria templates into Forge PDD fields without replacing PDD.

Evidence Pointers

The follow-up subagent pass inspected these concrete local paths:

aider/aider/repomap.py, aider/aider/coders/base_coder.py, aider/aider/linter.py, aider/aider/args.py, aider/aider/io.py, and aider/benchmark/README.md.
Agentless/agentless/fl/localize.py, Agentless/agentless/repair/rerank.py, Agentless/agentless/repair/repair.py, Agentless/agentless/test/generate_reproduction_tests.py, and Agentless/agentless/test/run_tests.py.
RA.Aid/ra_aid/agents/, RA.Aid/ra_aid/tools/programmer.py, RA.Aid/ra_aid/database/models.py, RA.Aid/ra_aid/config.py, RA.Aid/ra_aid/database/connection.py, and RA.Aid/ra_aid/tools/shell.py.
plandex/app/cli/lib/apply.go, plandex/app/cli/lib/rewind.go, plandex/app/cli/lib/git.go, plandex/app/cli/lib/repl.go, plandex/app/cli/cmd/plan_exec_helpers.go, plandex/app/cli/cmd/plan_start_helpers.go, plandex/app/server/db/diff_helpers.go, and plandex/app/server/db/plan_config_helpers.go.
codex/codex-rs/exec/src/exec_events.rs, codex/codex-rs/linux-sandbox/README.md, codex/codex-rs/linux-sandbox/src/linux_run_main.rs.
gemini-cli/evals/README.md, gemini-cli/perf-tests/README.md, gemini-cli/memory-tests/, gemini-cli/packages/cli/src/config/config.ts, gemini-cli/packages/cli/src/nonInteractiveCli.ts, gemini-cli/packages/core/src/output/types.ts, gemini-cli/packages/core/src/policy/types.ts, and gemini-cli/packages/core/src/config/storage.ts.
qwen-code/docs/users/configuration/trusted-folders.md, qwen-code/docs/design/fork-subagent/fork-subagent-design.md, qwen-code/integration-tests/terminal-capture/, qwen-code/packages/cli/src/config/config.ts, qwen-code/packages/cli/src/nonInteractiveCli.ts, qwen-code/packages/cli/src/nonInteractive/session.ts, qwen-code/packages/channels/base/README.md, and qwen-code/packages/core/src/permissions/types.ts.
opencode/.opencode/, opencode/specs/v2/session.md, opencode/packages/opencode/specs/effect/schema.md, opencode/packages/opencode/src/session/schema.ts.
crush/internal/db/, crush/internal/hooks/, crush/internal/permission/, crush/internal/agent/tools/mcp/.
Claude Code public documentation and observed command/transcript behavior.
letta-code/src/cli/components/McpConnectFlow.tsx, letta-code/src/cli/helpers/mcpOauth.ts, letta-code/src/agent/approval-recovery.ts.
neovate-code/src/mcp.ts, neovate-code/src/commands/mcp.ts, neovate-code/src/slash-commands/builtin/mcp.tsx, neovate-code/src/session.ts, neovate-code/src/config.ts, neovate-code/src/tools/bash.ts, and neovate-code/src/ui/ApprovalModal.tsx.
amazon-q-developer-cli/crates/agent/src/agent/mcp/, amazon-q-developer-cli/crates/chat-cli/src/cli/mcp.rs, amazon-q-developer-cli/schemas/agent-v1.json, amazon-q-developer-cli/crates/chat-cli-ui/src/protocol.rs, and amazon-q-developer-cli/crates/chat-cli/src/cli/chat/tools/delegate.rs.
goose/crates/goose/src/config/goose_mode.rs, goose/crates/goose/src/config/permission.rs, goose/crates/goose-cli/src/session/mod.rs, goose/crates/goose-cli/src/cli.rs, goose/crates/goose/src/session/session_manager.rs, and goose/crates/goose/src/recipe/mod.rs.
crush/internal/cmd/root.go, crush/internal/cmd/run.go, crush/internal/proto/proto.go, crush/internal/backend/permission.go, crush/internal/db/migrations/20250424200609_initial.sql, crush/internal/cmd/session.go, and crush/internal/config/config.go.
ace-coder/docs/MCP_SERVER.md, ace-coder/docs/plans/2026-04-05-mcp-daemon-refactor.md, ace-coder/python/ai_dev/mcp/.
symphony/README.md, symphony/SPEC.md, symphony/elixir/WORKFLOW.md, symphony/elixir/AGENTS.md, and .codex/skills/land/SKILL.md.
singularity/machine/README.md, package.json, templates/workflows/, docs/architecture/engine-matrix.md, and docs/OPENAI_SPECS_DOWNLOAD.md.
spec-kit/README.md, templates/commands/specify.md, templates/commands/plan.md, templates/commands/tasks.md, and scripts/bash/common.sh.
OpenSpec/docs/concepts.md, OpenSpec/docs/commands.md, OpenSpec/openspec/specs/, and OpenSpec/openspec/changes/.
spec-kitty/architecture/adrs/2026-04-08-1-global-kittify-machine-level-runtime.md, spec-kitty/architecture/adrs/2026-04-08-2-package-bundled-templates-sole-source.md, and spec-kitty/architecture/adrs/2026-03-09-1-prompts-do-not-discover-context-commands-do.md.
cc-sdd/AGENTS.md, cc-sdd/README.md, cc-sdd/docs/guides/spec-driven.md, and cc-sdd/docs/guides/skill-reference.md.

Context7 Cross-Check

Context7 was used after the local-source pass as a secondary check for indexed public references. Local source remains the evidence of record because it is the snapshot available on this machine.

/openai/codex confirmed the relevant Codex public patterns: interactive and non-interactive CLI modes, app-server, AGENTS.md project guidance, approval policy, and sandbox modes (read-only, workspace-write, danger-full-access) with writable roots and network controls.
/plandex-ai/plandex confirmed the relevant Plandex public patterns: semi/full run-control levels, smart context loading, cumulative diff review sandbox, review/apply/debug workflow, and large multi-file task focus.
/ai-christianson/ra.aid confirmed the relevant RA.Aid public patterns: research-only and research-and-plan-only modes, the research -> planning -> implementation workflow, logging/cost visibility, and the risky --cowboy-mode shell approval bypass that Forge should not copy.
Context7 also resolves these remaining comparison targets for later deeper checks:
- Aider: /websites/aider_chat and /aider-ai/aider.
- Qwen Code: /qwenlm/qwen-code, /websites/qwenlm_github_io_qwen-code-docs, and /websites/qwenlm_github_io_qwen-code-docs_en.
- OpenCode: /anomalyco/opencode.
- OpenAI Symphony: /openai/symphony.
- Kimi Code: /moonshotai/kimi-cli, /websites/moonshotai_github_io_kimi-cli_en, and /websites/kimi_code.
Spec Kit: /github/spec-kit and /websites/github_github_io_spec-kit.
- Upstream CodeMachine CLI did not resolve by name in Context7 during this pass, but GitHub confirms https://github.com/moazbuilds/CodeMachine-CLI as the public upstream-style repo for CodeMachine CLI. The local checkout inspected is https://github.com/singularity-ng/machine.git, so treat it as local fork/mirror evidence rather than exact upstream state.

Local Sift Cross-Check

ACE is private/local and should not be treated as Context7-indexed. Use sift for ACE and Forge when checking private or machine-local architecture.

For dependency hygiene, do not run broad sift search over repo roots that may contain vendored dependencies, package caches, build output, or generated blobs. This sift install does not expose an exclude flag, so scope searches to first-party paths such as docs/, src/, packages/*/src/, specs/, AGENTS.md, CLAUDE.md, and known design files. Avoid node_modules/, vendor/, dist/, build/, target/, .venv/, caches, fixture dumps, and generated lock/schema/output directories unless the dependency surface itself is the subject of the question.

The targeted sift pass found:

Codex codex-rs/protocol/src/config_types.rs and protocol.rs: confirms first-party typed approval policy and sandbox mode surfaces without searching codex-rs/vendor/.
OpenCode packages/opencode/specs/effect/schema.md: confirms the schema-first rule to prefer one canonical schema definition and derive compatibility schemas instead of maintaining parallel sources of truth.
Aider first-party docs/tests: confirms local repo-map/edit-format/lint/test and commit behavior surfaces.
Plandex README.md, changelog, and first-party app model files: confirms the cumulative diff sandbox, controlled command execution, rollback/debug loop, and planning phases.
Qwen Code docs/: confirms terminal-capture integration tests, trusted folders documentation, and provider configuration docs.
RA.Aid first-party docs/source: confirms shell command approval bypass via --cowboy-mode, research/planning agents, and session/logging surfaces.
Symphony first-party spec/workflow files: confirm issue-tracker polling, per-issue workspace isolation, repo-owned WORKFLOW.md, Codex app-server lifecycle, max turns/concurrency, retry/backoff, state snapshots, token/rate observability, PR feedback sweeps, and land-loop skills.
CodeMachine first-party docs/templates: confirm local multi-agent orchestration, heterogeneous engine routing, spec-to-code workflow templates, feature-flag governance, health/status commands, and optional MCP tooling. GitHub upstream moazbuilds/CodeMachine-CLI confirms the public product framing: repeatable long-running workflows, multi-agent orchestration, parallel execution, context engineering, and headless scripting of coding engines such as Claude Code, Codex, Cursor, and others.
ACE AGENTS.md: confirms the repo-local Claude MCP client contract, hard stops, skills, reviewer workflow, quality gate, and the warning that ACE's autonomous system uses its own code/YAML workflow DAGs rather than AGENTS.md.
ACE docs/designs/SPEC_TO_BUILD_PIPELINE_DESIGN.md: confirms the spec -> feature graph -> ADR detection -> implementation planning/review pipeline.
Forge docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md: confirms the durable SF boundary: if external control is needed, use daemon/RPC/headless contracts; MCP remains SF-as-client only.
Forge src/resources/extensions/sf/tests/no-sf-mcp-server.test.mjs: confirms there is executable guard coverage preventing recreation of SF MCP server paths.

Local Surface/State Cross-Check

The detailed coder-agent pass supports Forge's five-axis operating model.

Codex has the cleanest split: TUI flags, non-interactive exec, app-server protocol, sandbox mode, approval policy, persistent state, and input history are separate code paths and types. Forge should copy the typed separation, not the exact crate structure.
OpenCode treats the TUI as one client of a server, exposes HTTP/OpenAPI, and bridges ACP as a protocol adapter over its server. Forge should adapt the client/server clarity and generated SDK idea without making HTTP the only mental model for machine access.
Claude Code is strong on structured headless output, rich JSONL transcripts, entrypoint metadata, permission modes, and sandbox schemas. Forge should adapt the stream discipline and transcript fields while avoiding hidden feature-flag sprawl.
old open-codex is the cautionary example: autonomy, approval policy, output, and sandbox behavior collapse into a small set of flags. Forge should keep run control, output format, protocol, and permission profile independent.

The second coder-agent pass adds state/history guardrails:

Aider's prompt-file, dry-run, and explicit yes primitives are useful batch affordances, but its flat history files should remain convenience only.
RA.Aid's overrideable project state directory is useful for CI/sandboxes, but broad shell-approval bypasses are not.
Plandex's decomposed context/apply/execution/commit modes map well to Forge's need to keep run control separate from permission profile and Git policy.
Agentless's stage JSONL artifacts are good eval/evidence inputs; import them into DB-backed contracts instead of making JSONL the live state model.
Neovate's JSONL session replay and high-risk bash classifier are useful; its quiet-mode auto-approval is not.

The terminal-agent pass adds concrete machine/API patterns:

Gemini and Qwen both expose text, json, and stream-json output formats. Qwen goes further with stream-json input plus JSON fd/file side channels. Forge should adopt the bidirectional contract shape for parent processes.
Goose cleanly names run-control and permission behavior, and its non-interactive path refuses approval-required tool calls unless the run was explicitly allowed to proceed. Forge should copy that fail-closed posture.
Crush shows the right architecture shape for TUI-over-backend/session-store while keeping session/message/file history in SQLite. Forge already wants this DB-first boundary; the lesson is to avoid making the machine protocol hidden or text-only.
Amazon Q has the richest declarative agent schema and useful lifecycle/text/ tool/state event taxonomy. Forge should adapt manifests and event taxonomy, not recursive delegate subprocesses or raw passthrough protocol events.
GitHub Copilot CLI's autopilot documentation is a useful naming cross-check: autopilot is the continuation behavior, --allow-all/--yolo are permission expansion, and --no-ask-user is question suppression. Forge should keep the same separation but use SF's own terms: run control is manual | assisted | autonomous, permission profile is restricted | normal | trusted | unrestricted, and headless/machine output is a surface/format concern.

Local Spec-System Cross-Check

The spec-system pass reinforces the current Forge direction: specs and docs are valuable contracts for humans, but command/runtime state must own execution.

Spec Kit creates specs/<feature>/spec.md, stores the active feature pointer in .specify/feature.json, runs plan/task setup scripts that return JSON, and generates quality checklists before planning. The useful pattern is explicit generated artifacts plus machine-readable path discovery, not a second SF planning database. Its analyze command is a useful read-only consistency shape for comparing spec, plan, tasks, and constitution; in Forge that should compare .sf/DB state, generated docs, evidence, and code diffs.
OpenSpec separates openspec/specs/ as current behavior from openspec/changes/ as proposed modifications, then uses commands like propose, continue, ff, verify, sync, and archive to move artifacts through a schema-defined dependency graph. Its best pattern is deterministic ready/blocked artifact queries, apply requirements, and delta validation. Forge should keep the change/spec distinction as a human review model while storing operational order/gates in .sf/sf.db.
spec-kitty is strongest on runtime boundaries: one global machine runtime, thin project overlay, package-bundled templates as the sole end-user source, manifest-last generated artifact promotion, and command-owned action context. Its newer runtime pattern is also important: append-only status events, materialized status, expected-artifact manifests with blocking semantics, step contracts, explicit write scopes, review evidence, and commit hooks that keep planning artifacts out of lane branches. Forge should copy the boundary discipline and avoid prompt-level discovery for any context a command can resolve.
cc-sdd is strongest on phase gates and role separation: steering vs feature specs, discovery routing, requirements/design/tasks approvals, boundary-first task contracts, and per-task implementer/reviewer/debugger contexts. Forge should adapt the contract discipline into PDD fields and UOK gates without making markdown checkboxes the operational state store. The most reusable pieces are boundary/dependency annotations, observable completion, approval booleans as explicit state, implementation-note propagation, and a manifest planner for generated agent/startup artifacts with dry-run/conflict policy.
Forge docs/records/2026-05-07-cli-agent-code-survey.md: now records the MCP-client-only product boundary and roadmap pull-through.

Implementation Follow-Up

The first DB-backed retrieval slice landed with schema v41:

retrieval_evidence records backend, source kind, query, strategy, scope, project root, git head/branch, worktree dirty flag, freshness, status, hit count, elapsed time, cache path, error, result metadata, and timestamp.
sift_search and codebase_search write retrieval evidence for successful and failed searches.
Native Context7 resolve_library and get_library_docs write docs retrieval evidence with freshness=external-index.
search-the-web writes web retrieval evidence with freshness=external-live for success, cache hits, missing-provider errors, duplicate-loop stops, budget exhaustion, aborts, and provider failures.
sf_retrieval_evidence exposes the rows through the SF read-only DB tool surface so agents do not query .sf/sf.db directly.
Sift telemetry now uses the no-op debug logger; telemetry failures no longer turn successful searches into failed tool calls.

Next slices should wrap search_and_read and fetch_page results in the same evidence contract before using them for planning.

The first execution-policy vocabulary slice also landed:

execution-policy.js defines named plan, build, trusted, and unrestricted profiles with filesystem, network, git, and mutation posture.
The plan profile reuses the existing queue-mode write gate, so read-only commands and .sf/ planning artifacts are allowed while source mutations are denied.
The build profile records destructive bash risk labels from the existing destructive-command classifier without changing runtime enforcement yet.
Auto-mode now writes execution-policy-decision journal events for tool calls, recording the profile, allow/deny result, risk, destructive labels, tool name, call id, and policy-relevant command/path only.

Next slices should project these profile decisions into UOK evidence and the machine-surface JSON/JSONL projections before broad enforcement.

Resulting Direction

Forge should absorb proven patterns into UOK and the existing DB-first runtime: structured state, explicit modes, stronger permissions, reproducible evidence, and better review UX. The goal is not feature parity with every coder. The goal is a purpose-to-software compiler whose run control and permission profile are inspectable, recoverable, and safe enough to run repeatedly.

39 KiB Raw Blame History