singularity-forge/docs/dev/ADR-016-charm-ai-stack-adoption.md

111 lines
8.5 KiB
Markdown
Raw Permalink Normal View History

# ADR-016: Charm AI stack adoption strategy
**Date**: 2026-04-29
**Status**: accepted (strategic frame; concrete decisions in ADR-013/014/015/017)
## Context
Older SPEC notes retargeted sf v3 from Go-on-Crush to TypeScript-on-pi-mono. That decision still stands for sf core. But over the past year the Charm ecosystem has matured to a point that *adjacent* services would be better served by it:
- **`fantasy`** (730 stars, pushed today) — multi-provider AI agent SDK in Go. Equivalent of `pi-ai`. Wasn't this complete at retarget time.
- **`catwalk`** (688 stars) — provider/model registry used by `crush`.
- **`crush`** (23,641 stars) — Charm's agentic coding CLI.
- **`charm`** (2,491 stars) — encrypted KV with sync, self-hostable as `charm-server`. Foundation for cross-instance state.
- **`wish`** (5,158 stars) + `wishlist` + `promwish` — full SSH-served service stack with built-in metrics.
- **`bubbletea`** (41,946 stars) + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` — production TUI stack.
- **`x/vt`, `x/ansi`, `x/cellbuf`, `x/mosaic`, `x/vcr`, `x/xpty`, `x/conpty`, `x/editor`, `x/sshkey`, `x/term`** — bleeding-edge primitives, all actively maintained.
- **`pony` + `ultraviolet`** — next-gen declarative TUI markup. Pre-1.0 / experimental.
- **`anthropic-sdk-go`, `openai-go`, `go-genai`** — Charm-maintained Go LLM SDKs.
The question the historical retarget didn't have to answer: **now that this much is here, do we migrate?**
## Decision
**Option A — Parallel build, no core migration.**
- **sf core (TypeScript on pi-mono): unchanged.** The historical retarget rationale stands. Pi-mono SDK alignment, MCP client boundary, ~200+ TS files, real production users — none of it justifies a 36 month rewrite.
- **New services: Go on Charm, comprehensively.** sf-worker (ADR-013), Singularity Knowledge + Agent Platform (ADR-014), flight recorder (ADR-015), Charm TUI client (ADR-017) — all in Go using the Charm ecosystem.
- **Native engine (Rust): permanent.** ~11k LOC in `rust-engine/` (git, text, forge_parser, grep, highlight, ast, diff, etc.) is best-of-breed and not re-implementable in Go without losing performance. Bindings (napi-rs from TS today; cgo from Go for new services if needed) flex per consumer.
- **Pony adoption: now, not deferred.** Reversed from initial conservative stance. Adopting pony from day one in Phase-3 admin surfaces (Singularity Memory admin UI, future audit dashboards) — admin tolerates churn better than user-facing surfaces, and the foundation bet pays back if pony stabilises.
- **Other `charmbracelet/x/*` packages: adopted comprehensively.** When a new Go service needs a primitive (image rendering, session recording, pty, editor, input handling), use the `x/*` package. Don't reinvent.
- **Re-evaluation trigger: 12 months from first Go service in production.** If >50% of *new* sf code lands in Go services, the question of consolidating sf core becomes worth re-asking. Until then, polyglot is the right cost shape.
## Alternatives Considered
- **Option B — Soft migration, gradual rewrite.** Use Charm for new code AND opportunistically rewrite TS modules in Go when they need substantial work anyway. Eventually sf core drifts to Go.
- *Rejected:* rolling polyglot across the same logical layer is harder to reason about than per-service polyglot at clean boundaries. Some PRs would bridge languages mid-feature; CI complexity grows.
- **Option C — Big-bang migration.** Re-fork from `crush`, port sf's auto-loop, gates, planner, harness, skills doctrine into Go.
- *Rejected:* 36 months of no feature shipping; production users disrupted; loss of pi-mono SDK upstream alignment. The retarget rationale isn't fully invalidated — the only argument it relied on that's weakened is "70% of Crush is duplicated in pi-mono", and even that's still true *and* the cost of rewriting outweighs the duplication tax.
- **Status quo** — keep everything in TS, including new services.
- *Rejected:* Node ecosystem doesn't have equivalents for `wish`, `fantasy`, `charm`-server, the comprehensive TUI/SSH/AI stack Charm provides. Building these in TS would be reinventing maturer Go libraries. New services in Charm are just *easier*.
## Architectural Picture
```
[Charm TUI client] Go ← ADR-017: pony + ultraviolet + bubbles + lipgloss +
glamour + huh + harmonica + x/mosaic
[Singularity Knowledge + Agent Platform] Go ← ADR-014: charm-server + fantasy +
Postgres+vchord + pony admin
[sf-worker SSH host] Go ← ADR-013: wish + xpty/conpty + promwish
[Flight recorder] Go ← ADR-015: x/vcr
│ RPC / MCP / SSH / HTTP
[sf daemon + core] TS ← unchanged, pi-mono SDK aligned
│ napi-rs
[native engine] Rust ← permanent, ~11k LOC
```
Three languages, three clean boundaries. Each layer using the stack that fits.
## Consequences
**Positive**
- **No 36 month feature freeze** — sf core ships normally during the new-service build-out.
- **Right tool for each layer** — Go's ecosystem advantages (Wish, fantasy, charm-server) accrue without disrupting what already works in TS.
- **Strategic optionality** — pony and ultraviolet bets are localised to admin surfaces; if they fail, only those views need swapping.
- **Comprehensive adoption** beats piecemeal — using Charm's stack across multiple new services means we develop deep ecosystem familiarity, can share patterns across services, and contribute back upstream where useful.
**Negative**
- **Polyglot deployment** — TS + Go + Rust + (transitional Python during Singularity Memory migration). Three or four runtimes on a single host. Operationally manageable; not free.
- **Pi-mono SDK alignment is one-way** — Charm's stack improvements don't flow to sf core. We get pi-mono updates upstream; we don't get fantasy updates upstream-of-sf.
- **Cross-language refactors** are harder — when an interface between TS and Go needs to change, both sides need a coordinated PR. Mitigated by stable RPC/MCP/SSH-stdio contracts.
**Risks and mitigations**
- *Risk:* `fantasy` or `pony` API churn breaks builds repeatedly.
- *Mitigation:* pin versions; planned upgrade windows; pony swappable via clean view-layer separation.
- *Risk:* Charm pivots away from one of these libraries.
- *Mitigation:* Charm's stack is large and self-reinforcing; abandonment of a single piece (e.g., pony, which is experimental) is recoverable. Foundation libs (`bubbletea`, `wish`, `lipgloss`, `glamour`) are mature with strong commit cadence and unlikely to be abandoned.
- *Risk:* Re-evaluation in 12 months says "actually we should consolidate to Go" and we've now got 12 months of TS-only work to throw out.
- *Mitigation:* sf core code from now until then stays useful even if a future migration happens — it documents requirements and behaviour. Worst case it becomes the *spec* the Go rewrite implements.
## Out of Scope (explicit non-decisions to keep them from re-emerging)
- **Migrating pi-mono SDK to Go.** No.
- **Replacing `pi-tui` in sf core with a Charm TUI in-process.** No — Charm TUI is a separate client (ADR-017), pi-tui stays in core until that client reaches parity, then deprecates.
- **Adopting Crush as the agent loop.** No — pi-coding-agent stays.
- **Migrating native Rust to Go.** No — Rust is best-of-breed for what it does.
- **Self-hosting a Charm Cloud account / `charm-server` as a separate sidecar.** No — port `charm-server` patterns (auth/identity) as library code into our Go services.
## Sequencing
| When | Action |
|---|---|
| Now | This ADR captures the strategic frame. Concrete service builds tracked in 013/014/015/017. |
| 12 months from first Go service in production | Re-evaluate. Audit polyglot deployment costs vs. consolidation benefit. If >50% of new sf code is Go AND ops cost of polyglot is non-trivial AND TS sf core has shrunk substantially (post-pi-tui-deprecation), open a successor ADR proposing Option C (big-bang). |
## References
- Older SPEC notes §1 — original retarget rationale (TS-on-pi-mono over Go-on-Crush).
- `ADR-013` — Network + remote execution (concrete: sf-worker).
- `ADR-014` — Singularity Knowledge + Agent Platform (concrete: SM rewrite).
- `ADR-015` — Flight recorder (concrete: x/vcr-based).
- `ADR-017` — Charm TUI client (concrete: pi-tui replacement).
- `BUILD_PLAN.md` — tier-based execution tracking.
- Charm org: https://github.com/orgs/charmbracelet — full ecosystem inventory.