singularity-forge/docs/dev/ADR-014-singularity-knowledge-and-agent-platform.md

127 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-014: Singularity Knowledge + Agent Platform stack
**Date**: 2026-04-29
**Status**: proposed (deferred — capture for staged execution)
**Revised**: 2026-05-02 — Phase 4 cancelled, see [ADR-019](./ADR-019-workspace-vm-convergence.md)
## Context
Older SPEC notes define a cross-instance knowledge layer (Singularity Memory) and sketch persistent agents plus inter-agent messaging. Treat those notes as historical input, not current SF source of truth. sf instances today carry their own local memory store (`memory-store.ts`); persistent agents are not implemented at all.
Two trajectories converge:
- **Knowledge federates** — anti-patterns, learnings, contracts should be reachable across sf instances and across other agent products on the tailnet (Hermes, OpenClaw, Claude Code, Cursor).
- **Persistent agents centralise** — long-lived cross-project agents (code-reviewer with cross-project memory, memory-curator, security-auditor, build-watch) are too heavy and too cross-cutting to live per-project.
These two needs collapse into one service: the **Singularity Knowledge + Agent Platform** — a single Go server hosting the federated memory store *and* the central persistent-agent runtime. *(Note: the persistent-agent runtime portion — Phase 4 — has since been cancelled by [ADR-019](./ADR-019-workspace-vm-convergence.md). This ADR's active scope is the knowledge layer only, Phases 03.)*
This ADR fixes the stack.
The implementation arm of this ADR lives in [`singularity-memory/MIGRATION.md`](https://github.com/singularity-ng/singularity-memory/blob/main/MIGRATION.md).
## Decision
- **Language: Go.**
- **Storage backbone: Postgres + vchord** (existing) — accessed from Go via `pgx`. No data migration; same schema, same vchord index.
- **Identity / auth / sync layer: `charmbracelet/charm`-server patterns** — SSH-key identity, JWT issuance, encrypted KV for user-level prefs and config. Adopted as ported library code; not run as a sidecar.
- **Agent runtime: `charmbracelet/fantasy`** — multi-provider LLM access (Anthropic, OpenAI, Google, Bedrock, OpenRouter, etc. via `catwalk`). Used for embeddings/summarisation today. *(The original plan to grow this into a full central persistent-agent runtime — Phase 4 — is cancelled by [ADR-019](./ADR-019-workspace-vm-convergence.md). `fantasy` is retained for embeddings/summarisation within the knowledge layer only.)*
- **HTTP API: Go `net/http` + chi or echo router**, serving the *exact* current OpenAPI contract.
- **HTTP API compatibility:** preserve the current OpenAPI contract. SF remains an HTTP/RPC client of the knowledge layer and does not expose its workflow as an MCP server.
- **CLI scaffolding: `charmbracelet/fang`.**
- **Observability: `promwish`-style Prometheus metrics**, scraped from a shared metrics endpoint.
- **Admin UI (Phase 3): `pony` + `ultraviolet`** for the view layer (reversed from earlier deferral; now adopted as a deliberate foundation bet — admin UI tolerates churn better than user-facing surfaces). Served over SSH via `wish`.
## Alternatives Considered
### Stack
- **Stay Python + FastAPI + Postgres.** Status quo. Works today.
- *Rejected:* misses the foundation bet for central persistent agents from the older SPEC notes. Building those on Python + raw OpenAI/Anthropic SDK calls means retrofitting fantasy-style agent semantics later — real refactor cost. The trigger to migrate isn't pain in the current server; it's foundation laying for what comes next.
- **Rust + axum + Postgres.** Uniformly fast, but Charm's agentic ecosystem (fantasy, catwalk, wish, charm-server, the entire Bubble Tea family) is Go-native. Rust on the server side would mean reimplementing those abstractions or shelling out. Rejected — wrong ecosystem.
- **TypeScript + Node + Postgres.** Keeps language alignment with sf core. But sf is moving toward parallel-build (ADR-016): TS in sf core, Go in new services. The Node ecosystem doesn't have an equivalent to fantasy + charm-server + Wish. Rejected.
### Storage backbone
- **Replace Postgres + vchord with `charm-server`'s native KV.** `charm-server` is a personal/team encrypted KV; it's not a vector DB or BM25 index. We'd lose retrieval sophistication. Rejected.
- **Replace Postgres with `sqlite-vec`.** Embeddable single-binary deployment is appealing, but BM25 quality on `tsvector` is hard to match without a full re-tune, and we'd be redoing data migration on top. Rejected for v1; revisit in a v2 retrieval ADR if the Go server needs to ship without Postgres.
- **Keep Postgres + vchord, connect via Go `pgx`.** ← chosen. Battle-tested retrieval, zero data migration, focus the migration on language/runtime/agent-platform changes only.
### Agent runtime
- **Direct SDK calls (`anthropic-sdk-go`, `openai-go`, `go-genai`).** Simplest for today's narrow LLM use (embeddings + summarisation). But future central persistent agents need agent-loop semantics (multi-turn, tool calls); building those on raw SDKs reinvents fantasy's abstractions. Rejected — foundation bet. *(Phase 4 is now cancelled by [ADR-019](./ADR-019-workspace-vm-convergence.md), so the persistent-agent motivation no longer applies; however `fantasy` is still chosen for its clean multi-provider API for embeddings/summarisation.)*
- **Build our own agent runtime in Go.** Pure NIH. Rejected.
- **`charmbracelet/fantasy`.** ← chosen. 730 stars, actively developed, clean API, multi-provider via `catwalk`.
## Consequences
**Positive**
- **Foundation is right** for the knowledge layer. *(The original "foundation for central persistent agents" rationale is superseded — Phase 4 is cancelled by [ADR-019](./ADR-019-workspace-vm-convergence.md). Persistent agents now live as Firecracker VM snapshots managed by ACE.)*
- **Single static Go binary** is operationally simpler than Python uv/venv + Alembic + worker on each deployment host.
- **Charm ecosystem alignment** with sf-worker (ADR-013), flight recorder (ADR-015), Charm TUI client (ADR-017). One language for the new-services tier.
- **Wire contract preserved** — clients are zero-touch.
**Negative**
- **Migration is a real undertaking** — ~12 weeks total, with the recall endpoint as the critical parity gate. See `MIGRATION.md`.
- **Polyglot deployment grows** — Python (during transition) + Go (new) + TS (sf core) + Rust (sf native). Bounded; once Python retires, three languages with clear boundaries.
- **`fantasy` and `pony` are pre-1.0** — API churn is real.
**Risks and mitigations**
- *Risk:* recall quality regression between Python and Go.
- *Mitigation:* held-out evaluation set; ±2% recall@k threshold enforced in CI before flipping traffic.
- *Risk:* `pgx` + vchord custom-type decoder edge cases.
- *Mitigation:* prove out in Phase 1 against a small endpoint; engage vchord author if blocked.
- *Risk:* `fantasy` API churn during the migration.
- *Mitigation:* pin a version; one planned upgrade midway through the migration.
- *Risk:* central agents prove unworkable as a model and we've over-built the foundation.
- *Mitigation:* the foundation cost is incremental (fantasy ≈ raw SDK + a thin abstraction). Worst case we use fantasy for embeddings only and never grow it. No wasted bet. *(Moot — Phase 4 is cancelled by [ADR-019](./ADR-019-workspace-vm-convergence.md); fantasy stays scoped to the knowledge layer.)*
## Out of Scope
- **Cross-tenant Singularity Memory** — single trust domain per deployment.
- **Retrieval-pipeline redesign** — BM25 + vector + RRF + reranker semantics are preserved exactly.
- **DB migration** — Postgres + vchord stay.
- **Public-internet endpoint** — tailnet only per ADR-013.
## Sequencing
| Phase | What | Cost |
|---|---|---|
| 0 | Prep: commit OpenAPI spec, build test suite, set up CI (per existing `TODO.md`) | 12 weeks |
| 1 | Greenfield Go scaffold parallel to Python; first endpoint (`GET /v1/banks`) | 23 weeks |
| 2 | Endpoint parity (recall is the critical gate) | 48 weeks |
| 3 | Worker + admin UI (`pony` + `ultraviolet` on `wish`) | 23 weeks |
| ~~4~~ | ~~Central persistent-agent host~~ | ~~variable~~ |
| 5 | Python deprecation | 1 week |
Total: ~12 weeks for Phases 03 + Phase 5. Phase 4 is cancelled — see section below.
## Phase 4 — Cancelled (See [ADR-019](./ADR-019-workspace-vm-convergence.md))
Phase 4 was originally planned as a "central persistent-agent runtime" built on `charmbracelet/fantasy` inside singularity-memory's Go server. [ADR-019](./ADR-019-workspace-vm-convergence.md) (Workspace VM Convergence, 2026-05-01) supersedes this plan entirely.
**What replaced it:** Persistent agents now live as **Firecracker VM snapshots managed by ACE**'s orchestration layer. A "persistent agent" is a named VM snapshot: restore it, and the agent wakes with its full memory and context intact. singularity-memory's scope is now strictly the knowledge layer (Phases 03). See ADR-019 § "ADR-014 Phase 4 is reassigned" for the authoritative statement.
### Historical: Original Phase 4 Plan
> *The content below is the original Phase 4 design, preserved as a historical record. It is **not** the current plan.*
The original Phase 4 called for singularity-memory's Go server to host a central persistent-agent runtime using `charmbracelet/fantasy`. Long-lived cross-project agents (code-reviewer, memory-curator, security-auditor, build-watch) would run there, with their state managed by the same Postgres store. This depended on the older persistent-agent notes being fully scoped ("status NEW" at ADR-014's writing date).
The rationale for building this in singularity-memory was ecosystem alignment with `fantasy` + `charm-server` + `wish` and avoiding per-project agent redundancy. The timeline was listed as "variable" because persistent-agent scope had not been fully defined.
ADR-019 made this moot by choosing a cleaner isolation model (hypervisor-level VM snapshots) that is language-agnostic inside the VM, multi-tenant by construction, and owned by ACE rather than a shared Go server.
## References
- `MIGRATION.md` (singularity-memory repo) — implementation arm.
- Older SPEC notes §16 — Knowledge Layer historical input.
- Older SPEC notes §1718 — Persistent Agents and Inter-Agent Messaging historical input.
- `ADR-012` — Multi-instance federation (this is one of its surfaces).
- `ADR-013` — Network and remote-execution (deployment substrate).
- `ADR-016` — Charm AI stack adoption (frames the polyglot decision).
- `charmbracelet/charm` — KV with sync (auth/identity patterns ported here).
- `charmbracelet/fantasy` — agent runtime.
- `charmbracelet/catwalk` — provider/model registry.