singularity-forge/docs/dev/drafts/M053-per-repo-supervisor.md
Mikael Hugo d03758d803 feat: replace launchd with systemd user-unit install path
Operator-direction 2026-05-17 "we will never use mac" — no compat
preservation. Single-cutover replacement.

- new packages/daemon/src/systemd.ts: install/uninstall/status using
  systemctl --user + ~/.config/systemd/user/sf-server.service
- new packages/daemon/src/systemd.test.ts: ports launchd tests, same
  shape, mocked systemctl via RunCommandFn injection + SF_SYSTEMD_USER_DIR
  env override for real filesystem tests
- cli-main.ts: switch import + update help text + status messages
- index.ts: re-export systemd module (installSystemdUnit, uninstallSystemdUnit,
  systemdUnitStatus, generateUnit, getServicePath, SystemdStatus, SystemdUnitOptions)
- DELETED: launchd.ts (253 LOC), launchd.test.ts (379 LOC)
- docs/dev/drafts/M053-per-repo-supervisor.md: remove "launchd" mention
- CHANGELOG.md: document systemd-only install path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:34 +02:00

4 KiB

M053 Supervisor And Web Server Shape

M053 uses one real server: the SF web/Next.js process. It is the operator surface and read-model aggregator.

Enrolled repositories do not run their own web server and do not host a shared workflow daemon. Each repo has its own supervised worker boundary for sf autonomous (implemented by the existing non-TUI machine surface). On Linux that boundary is a user-level systemd unit; other platforms can add equivalent supervisor adapters. The worker writes a small repo-local status projection, and the single web server reads those projections.

Process Model

The operator server is singular. It serves the browser UI, exposes /api/swarms, and reads ~/.sf/swarms.json to discover enrolled repositories.

Each repository worker is separate. It runs from that repo's working directory, uses that repo's .sf/sf.db, and writes only that repo's .sf/ runtime files. The worker is allowed to write .sf/status.projection.json with temp-file, fsync, and rename. It is not allowed to mutate another repo's DB or aggregate another repo's doctor/self-feedback/ledger data.

The supervisor is an OS/process boundary, not the product brain. A systemd user unit (or equivalent adapter on other platforms) may restart a worker and expose process health, but the planning state remains repo-local and the web server remains the operator surface.

Status Projection

Each repo publishes .sf/status.projection.json with projectionVersion: 1. The projection contains only the read-only fields the web dashboard needs: active milestone, active slice, current unit, next unit, queue depth, last-cycle outcome, writer timestamp, and coarse health.

The web reader treats missing, corrupt, or unknown-version projections as a per-repo degraded row. One broken projection must not break the dashboard.

The projection excludes self-feedback aggregation, full doctor reports, last-green-ledger details, cross-repo learning, and cross-repo dispatch. Those belong behind a future federation requirement, not M053.

Registry Sync

sf-server owns swarm registry refresh. It scans configured projects.scan_roots from ~/.sf/daemon.yaml and atomically rewrites ~/.sf/swarms.json on startup and every swarms.refresh_ms while the server is running. Operators can run a one-shot refresh with sf-server --sync-swarms.

This replaces the old script/watchdog shape. There is no repo-local enrollment script and no nested meta-supervisor. The registry is a server-owned read model: repo add/remove/rename is picked up by scan roots, and the web server reads the single registry.

Per-repo execution supervision remains a platform adapter. On Linux the target adapter is a user systemd unit that starts sf autonomous from the repo directory and restarts it with backoff. That adapter is owned by server/package code, not by scripts.

On Linux the first adapter is a user systemd unit named from a stable hash of the repo path. The unit uses Restart=always, RestartSec=30s, and systemd start-limit settings for crash-loop backoff.

RPC Client Boundary

@singularity-forge/rpc-client is the reusable stdio JSON-RPC adapter. Root headless clients and packages/daemon should import it directly. The coding agent remains the RPC server implementation and still owns interactive/session internals; it should not be the source of reusable client code for web, daemon, or headless orchestration.

Definition Of Done

M053 is done when:

  • The non-TUI status query writes the versioned atomic status projection.
  • Web has a Swarms view backed by /api/swarms.
  • The web reader survives missing/corrupt projections per repo.
  • sf-server auto-syncs ~/.sf/swarms.json from configured scan roots and can run the same refresh once with sf-server --sync-swarms.
  • Legacy script/watchdog entrypoints are removed from the normal lifecycle.
  • Linux server/package code can create a user-level systemd worker for sf autonomous.
  • Root headless/client utilities use @singularity-forge/rpc-client.

M053 is not done by creating a per-repo server. That is explicitly out of scope.