singularity-forge/docs/dev/ADR-015-flight-recorder.md
2026-04-29 17:44:30 +02:00

89 lines
5.8 KiB
Markdown

# ADR-015: Flight recorder via `charmbracelet/x/vcr`
**Date**: 2026-04-29
**Status**: proposed (deferred — capture for staged execution)
## Context
sf today writes:
- `.sf/event-log.jsonl` — structured event stream (phase changes, tool calls, errors).
- `.sf/traces/*.jsonl` — per-unit trace spans.
- `.sf/audit/` — historical state snapshots.
These are all *structured event streams*. They're great for programmatic analysis but they don't record what the auto-loop *looked like* on the operator's terminal — the actual TUI frames, the stream of tool output, the agent's thinking, the live progress indicators.
When something goes wrong in production (the auto-loop appears to hang, an agent generates surprising output, a hook misbehaves), the operator wants to **replay the session** — see what was on screen at minute 14 — not reconstruct it from JSON.
`charmbracelet/x/vcr` records terminal output as a sequence of frames and replays them deterministically. It's the right substrate for a flight recorder.
## Decision
- **Language: Go.** Standalone service or library; integrates with sf via shared filesystem (writes recordings to `.sf/recordings/`).
- **Recording substrate: `charmbracelet/x/vcr`** — captures ANSI/VT frames into a portable file format with timestamps.
- **Trigger: every auto-loop unit dispatch records by default.** Recording is opt-out per project via `.sf/config.toml` (`[telemetry] flight_recorder = false`).
- **Storage: `.sf/recordings/{unit-id}.vcr`**, with a retention policy (default 30 days, configurable). Old recordings auto-expire on the next sweep.
- **Replay: `sf replay <unit-id>`** — opens the recording in a TUI player; supports pause, scrub, frame-step, search-by-text.
- **Format: vcr-native.** No reinventing.
## Alternatives Considered
- **`asciinema`** — well-known terminal recorder, mature tooling, JSON-based format.
- *Rejected:* asciinema runs as a subprocess wrapping the shell. Integrating with sf's auto-loop (which is the *driver*, not a child of the recorder) requires inverting the model. `vcr` is library-shaped — sf calls into it.
- **`vhs`** — Charm's CLI video recorder, used for demos.
- *Rejected:* `vhs` is for scripted demos, not live capture. Wrong tool.
- **Re-render from `.sf/event-log.jsonl`** — replay events through pi-tui to reproduce the frames.
- *Rejected:* requires keeping pi-tui forever, and rendering depends on terminal geometry that may differ from the original. Frame-accurate replay is not the same as event replay; both have value but they're different products.
- **Build a custom recorder.**
- *Rejected:* `vcr` exists. NIH-don't.
## Consequences
**Positive**
- **Frame-accurate post-mortem** — when a unit fails or the auto-loop hangs, the operator sees exactly what was on screen, including timing.
- **Onboarding artefact** — recordings of "what does sf do?" become shareable demos without scripting.
- **Audit trail for destructive ops** — admin actions in the future Charm TUI client (ADR-017) and Singularity Memory admin UI (ADR-014) can be recorded for security audit.
- **Light coupling** — `vcr` is a Go library; sf's TS core invokes a small Go recorder process per unit dispatch. No tight integration with the agent loop.
**Negative**
- **Disk usage** — recordings are bigger than event logs (frame data vs. structured records). Mitigated by retention policy. Estimate: ~1MB per 10-minute unit at typical TUI density.
- **Operator-only** — frame replay isn't useful in headless contexts. Headless dispatches should disable recording (`SF_FLIGHT_RECORDER=0` env).
- **Polyglot crosses one more boundary** — sf core (TS) writes recordings via a Go subprocess. Same shape as ADR-013 (TS↔Go via stdio); manageable.
**Risks and mitigations**
- *Risk:* `vcr` API churn — it's in `charmbracelet/x` (experimental).
- *Mitigation:* pin a version; abstract recording behind an interface so a future swap is contained.
- *Risk:* Recording overhead measurably slows the auto-loop.
- *Mitigation:* benchmark before enabling-by-default. If overhead > 5%, ship as opt-in only.
- *Risk:* Sensitive data (tokens, paths, secrets) leaks into recordings.
- *Mitigation:* same redaction layer as `event-log.jsonl` — applied at the frame level before write. Enforce via a redaction filter applied to the VT stream.
## Out of Scope
- **Audio recording.** Terminal frames only.
- **Cross-host recording** — each host records its own units; flight-recorder doesn't try to stitch SSH-worker output onto orchestrator-side replay. (Each unit attempt has a `worker_host`; replay is per-host.)
- **Live remote viewing** of an in-progress recording — that's a different feature (could be Wish + Bubble Tea showing a "live" view of the auto-loop). Track separately if wanted.
## Sequencing
| When | Action |
|---|---|
| Tier 2/3 — after federation primitives land | Build a thin Go recorder process; sf core spawns one per unit dispatch. |
| Tier 3 | `sf replay <unit-id>` command — TUI player using Bubble Tea. |
| Tier 3 | Redaction filter parity with `event-log.jsonl`. |
| Tier 4 (nice-to-have) | Retention policy auto-sweep; recording bundle export (`sf recording export <unit-id>``.vcr.tar.gz` for sharing). |
## Out of Scope (continued — feature-creep guardrails)
- AI-assisted summarisation of recordings ("show me what failed in the last 5 unit attempts") — possible later via fantasy + recording metadata, but explicitly not v1.
- Web-based replay UI — server-rendered replay is a separate product surface; v1 is local TUI only.
## References
- `charmbracelet/x/vcr` — terminal recording library.
- `SPEC.md` §19 — Observability (where structured event logs and traces live).
- `ADR-016` — Charm AI stack adoption (frames why Go for new services).
- `ADR-017` — Charm TUI client (future replay UI consumer).