singularity-forge/docs/dev/ADR-016-charm-ai-stack-adoption.md

110 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-016: Charm AI stack adoption strategy
**Date**: 2026-04-29
**Status**: accepted (strategic frame; concrete decisions in ADR-013/014/015/017)
## Context
`SPEC.md` §1 retargeted sf v3 from Go-on-Crush to TypeScript-on-pi-mono. That decision still stands for sf core. But over the past year the Charm ecosystem has matured to a point that *adjacent* services would be better served by it:
- **`fantasy`** (730 stars, pushed today) — multi-provider AI agent SDK in Go. Equivalent of `pi-ai`. Wasn't this complete at retarget time.
- **`catwalk`** (688 stars) — provider/model registry used by `crush`.
- **`crush`** (23,641 stars) — Charm's agentic coding CLI.
- **`charm`** (2,491 stars) — encrypted KV with sync, self-hostable as `charm-server`. Foundation for cross-instance state.
- **`wish`** (5,158 stars) + `wishlist` + `promwish` — full SSH-served service stack with built-in metrics.
- **`bubbletea`** (41,946 stars) + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` — production TUI stack.
- **`x/vt`, `x/ansi`, `x/cellbuf`, `x/mosaic`, `x/vcr`, `x/xpty`, `x/conpty`, `x/editor`, `x/sshkey`, `x/term`** — bleeding-edge primitives, all actively maintained.
- **`pony` + `ultraviolet`** — next-gen declarative TUI markup. Pre-1.0 / experimental.
- **`anthropic-sdk-go`, `openai-go`, `go-genai`** — Charm-maintained Go LLM SDKs.
The question the SPEC retarget didn't have to answer: **now that this much is here, do we migrate?**
## Decision
**Option A — Parallel build, no core migration.**
- **sf core (TypeScript on pi-mono): unchanged.** SPEC §1 retarget rationale stands. Pi-mono SDK alignment, MCP-server story, ~200+ TS files, real production users — none of it justifies a 36 month rewrite.
- **New services: Go on Charm, comprehensively.** sf-worker (ADR-013), Singularity Knowledge + Agent Platform (ADR-014), flight recorder (ADR-015), Charm TUI client (ADR-017) — all in Go using the Charm ecosystem.
- **Native engine (Rust): permanent.** ~11k LOC in `rust-engine/` (git, text, forge_parser, grep, highlight, ast, diff, etc.) is best-of-breed and not re-implementable in Go without losing performance. Bindings (napi-rs from TS today; cgo from Go for new services if needed) flex per consumer.
- **Pony adoption: now, not deferred.** Reversed from initial conservative stance. Adopting pony from day one in Phase-3 admin surfaces (Singularity Memory admin UI, future audit dashboards) — admin tolerates churn better than user-facing surfaces, and the foundation bet pays back if pony stabilises.
- **Other `charmbracelet/x/*` packages: adopted comprehensively.** When a new Go service needs a primitive (image rendering, session recording, pty, editor, input handling), use the `x/*` package. Don't reinvent.
- **Re-evaluation trigger: 12 months from first Go service in production.** If >50% of *new* sf code lands in Go services, the question of consolidating sf core becomes worth re-asking. Until then, polyglot is the right cost shape.
## Alternatives Considered
- **Option B — Soft migration, gradual rewrite.** Use Charm for new code AND opportunistically rewrite TS modules in Go when they need substantial work anyway. Eventually sf core drifts to Go.
- *Rejected:* rolling polyglot across the same logical layer is harder to reason about than per-service polyglot at clean boundaries. Some PRs would bridge languages mid-feature; CI complexity grows.
- **Option C — Big-bang migration.** Re-fork from `crush`, port sf's auto-loop, gates, planner, harness, skills doctrine into Go.
- *Rejected:* 36 months of no feature shipping; production users disrupted; loss of pi-mono SDK upstream alignment. The retarget rationale isn't fully invalidated — the only argument it relied on that's weakened is "70% of Crush is duplicated in pi-mono", and even that's still true *and* the cost of rewriting outweighs the duplication tax.
- **Status quo** — keep everything in TS, including new services.
- *Rejected:* Node ecosystem doesn't have equivalents for `wish`, `fantasy`, `charm`-server, the comprehensive TUI/SSH/AI stack Charm provides. Building these in TS would be reinventing maturer Go libraries. New services in Charm are just *easier*.
## Architectural Picture
```
[Charm TUI client] Go ← ADR-017: pony + ultraviolet + bubbles + lipgloss +
glamour + huh + harmonica + x/mosaic
[Singularity Knowledge + Agent Platform] Go ← ADR-014: charm-server + fantasy +
Postgres+vchord + pony admin
[sf-worker SSH host] Go ← ADR-013: wish + xpty/conpty + promwish
[Flight recorder] Go ← ADR-015: x/vcr
│ RPC / MCP / SSH / HTTP
[sf daemon + core] TS ← unchanged, pi-mono SDK aligned
│ napi-rs
[native engine] Rust ← permanent, ~11k LOC
```
Three languages, three clean boundaries. Each layer using the stack that fits.
## Consequences
**Positive**
- **No 36 month feature freeze** — sf core ships normally during the new-service build-out.
- **Right tool for each layer** — Go's ecosystem advantages (Wish, fantasy, charm-server) accrue without disrupting what already works in TS.
- **Strategic optionality** — pony and ultraviolet bets are localised to admin surfaces; if they fail, only those views need swapping.
- **Comprehensive adoption** beats piecemeal — using Charm's stack across multiple new services means we develop deep ecosystem familiarity, can share patterns across services, and contribute back upstream where useful.
**Negative**
- **Polyglot deployment** — TS + Go + Rust + (transitional Python during Singularity Memory migration). Three or four runtimes on a single host. Operationally manageable; not free.
- **Pi-mono SDK alignment is one-way** — Charm's stack improvements don't flow to sf core. We get pi-mono updates upstream; we don't get fantasy updates upstream-of-sf.
- **Cross-language refactors** are harder — when an interface between TS and Go needs to change, both sides need a coordinated PR. Mitigated by stable RPC/MCP/SSH-stdio contracts.
**Risks and mitigations**
- *Risk:* `fantasy` or `pony` API churn breaks builds repeatedly.
- *Mitigation:* pin versions; planned upgrade windows; pony swappable via clean view-layer separation.
- *Risk:* Charm pivots away from one of these libraries.
- *Mitigation:* Charm's stack is large and self-reinforcing; abandonment of a single piece (e.g., pony, which is experimental) is recoverable. Foundation libs (`bubbletea`, `wish`, `lipgloss`, `glamour`) are mature with strong commit cadence and unlikely to be abandoned.
- *Risk:* Re-evaluation in 12 months says "actually we should consolidate to Go" and we've now got 12 months of TS-only work to throw out.
- *Mitigation:* sf core code from now until then stays useful even if a future migration happens — it documents requirements and behaviour. Worst case it becomes the *spec* the Go rewrite implements.
## Out of Scope (explicit non-decisions to keep them from re-emerging)
- **Migrating pi-mono SDK to Go.** No.
- **Replacing `pi-tui` in sf core with a Charm TUI in-process.** No — Charm TUI is a separate client (ADR-017), pi-tui stays in core until that client reaches parity, then deprecates.
- **Adopting Crush as the agent loop.** No — pi-coding-agent stays.
- **Migrating native Rust to Go.** No — Rust is best-of-breed for what it does.
- **Self-hosting a Charm Cloud account / `charm-server` as a separate sidecar.** No — port `charm-server` patterns (auth/identity) as library code into our Go services.
## Sequencing
| When | Action |
|---|---|
| Now | This ADR captures the strategic frame. Concrete service builds tracked in 013/014/015/017. |
| 12 months from first Go service in production | Re-evaluate. Audit polyglot deployment costs vs. consolidation benefit. If >50% of new sf code is Go AND ops cost of polyglot is non-trivial AND TS sf core has shrunk substantially (post-pi-tui-deprecation), open a successor ADR proposing Option C (big-bang). |
## References
- `SPEC.md` §1 — original retarget rationale (TS-on-pi-mono over Go-on-Crush).
- `ADR-013` — Network + remote execution (concrete: sf-worker).
- `ADR-014` — Singularity Knowledge + Agent Platform (concrete: SM rewrite).
- `ADR-015` — Flight recorder (concrete: x/vcr-based).
- `ADR-017` — Charm TUI client (concrete: pi-tui replacement).
- `BUILD_PLAN.md` — tier-based execution tracking.
- Charm org: https://github.com/orgs/charmbracelet — full ecosystem inventory.