diff --git a/.gitignore b/.gitignore index 01149b747..18f76326e 100644 --- a/.gitignore +++ b/.gitignore @@ -118,6 +118,8 @@ repowise.db # DB backups are local recovery artifacts created by migrations/maintenance. .sf/backups/db/ # Generated SF runtime projections, caches, reports, and recovery evidence. +.sf/status.projection.json +.sf/status.projection.json.tmp-* .sf/active/ .sf/graphs/ .sf/model-catalog/ diff --git a/.sf/REQUIREMENTS.md b/.sf/REQUIREMENTS.md index b5baf4807..3c5a1e474 100644 --- a/.sf/REQUIREMENTS.md +++ b/.sf/REQUIREMENTS.md @@ -865,27 +865,27 @@ ADR-0000 declares SF a **purpose-to-software compiler**. R036–R040 codify that - Validation: unmapped - Notes: Comparison must include the simpler alternative — a serial unit-queue scheduler with per-file locks — to validate that lanes are the right primitive vs premature parallelism. R067 blocks M033 + M039. -### R068 — Always-on per-repo via systemd +### R068 — Always-on per-repo supervisor - Class: capability - Status: active -- Description: SF runs always-on per-repo via systemd — one user-unit per registered repo wrapping `sf headless autonomous` with crash-restart + crash-loop backoff. Replaces the bash watchdog. No custom always-on daemon, no JSON-RPC server, no shared actors across repos. Each `sf headless` is self-contained (same as today, just supervised by systemd instead of bash). -- Why it matters: Operator wants 24/7 autonomous operation without manually-tended watchdogs. Per-repo systemd units deliver always-on with smallest blast radius (one bad repo can't take down others), zero new protocol surface, and standard ops tooling (`systemctl status`, `journalctl`). +- Description: SF runs always-on per repo through a supervisor boundary. The first implementation target is a user-level systemd unit per registered repo wrapping `sf headless autonomous` with crash-restart + crash-loop backoff. The existing `packages/daemon` may remain as an adapter/control package, but it must not become a shared multi-repo workflow owner or a custom always-on JSON-RPC brain. Each repo's execution stays self-contained. +- Why it matters: Operator wants 24/7 autonomous operation without manually-tended watchdogs. Per-repo supervisors deliver always-on with smallest blast radius (one bad repo cannot take down others), standard ops tooling (`systemctl status`, `journalctl`), and no new cross-repo workflow semantics. - Source: user-direction-2026-05-17, reshaped by codex-adversarial-review-2026-05-17 - Primary owning slice: M053/S10 - Validation: unmapped -- Notes: Was originally specified as a custom `sf serve` JSON-RPC daemon hosting multiple swarms. Codex review flagged the multi-swarm-in-one-daemon design as collapsing blast radius and quietly reintroducing federation; reshaped to systemd-per-repo. M028 federation remains the only place for cross-repo coordination. +- Notes: Was originally specified as a custom `sf serve` JSON-RPC daemon hosting multiple swarms. Codex review flagged the multi-swarm-in-one-daemon design as collapsing blast radius and quietly reintroducing federation; reshaped to per-repo supervision plus read-model projection. M028 federation remains the only place for cross-repo coordination. ### R069 — Multi-Swarm Isolation [CANCELLED] - Class: capability - Status: cancelled -- Description: [CANCELLED 2026-05-17] Was specified for a custom daemon hosting N swarms. With R068 simplified to systemd-per-repo, isolation is automatic — each unit is a separate process with its own `.sf/`. No daemon, no shared actors, no namespace negotiation. Superseded by R068. +- Description: [CANCELLED 2026-05-17] Was specified for a custom daemon hosting N swarms. With R068 simplified to per-repo supervision, isolation is automatic — each repo has its own process boundary and `.sf/`. No shared workflow actors, no namespace negotiation. Superseded by R068. - Source: user-direction-2026-05-17 - Notes: Cross-repo coordination, if ever needed, is M028 federation. ### R070 — Web Multi-Repo Dashboard (read-only status projection) - Class: capability - Status: active -- Description: Web `/swarms` route lists all registered repos (operator-curated `~/.sf/swarms.json`) with per-repo *status projection*: active milestone, current unit, last-cycle outcome, queue depth, supervisor-health. State sourced from a **dedicated, versioned, atomically-written read-model file** at each repo's `.sf/status.projection.json` — written by the swarm itself via temp+rename, schema-validated on read, with a `projectionVersion` field. Web aggregator watches via fs.watch/chokidar; falls back to 5s poll on platforms without fs notify. Drill-in opens the existing per-repo dashboard. **Excluded from this projection** (defer to M028 federation boundary, per codex review 2026-05-17): self-feedback content aggregation, full doctor reports, last-green-ledger details, any cross-repo learnings. Web shows aggregate counts/health flags from the projection; drill-in into a single repo's own dashboard for detail. +- Description: Web `/swarms` route lists all registered repos (operator-curated `~/.sf/swarms.json`) with per-repo *status projection*: active milestone, current unit, last-cycle outcome, queue depth, supervisor-health. State is sourced from a **dedicated, versioned, atomically-written read-model file** at each repo's `.sf/status.projection.json` — written by the repo's own SF process via temp+fsync+rename, schema-validated on read, with a `projectionVersion` field. Web aggregator polls every 5s initially; fs.watch/chokidar can be added after the read model is stable. Drill-in opens the existing per-repo dashboard. **Excluded from this projection** (defer to M028 federation boundary, per codex review 2026-05-17): self-feedback content aggregation, full doctor reports, last-green-ledger details, any cross-repo learnings. Web shows aggregate counts/health flags from the projection; drill-in into a single repo's own dashboard for detail. - Why it matters: Operator running SF across N repos needs sublinear-attention visibility — one tab to see all swarms at a glance. A dedicated projection file (rather than reading raw mutable `.sf/state.json`/doctor reports) avoids partial-write parsing, schema drift, and cross-repo trust propagation. Per-repo error isolation: a corrupt projection in one repo surfaces as "degraded" status, not a dashboard crash. The projection is one direction (swarm → read-model) and explicitly does NOT cross ADR-012's deferred federation boundary; cross-repo coordination remains M028. - Source: user-direction-2026-05-17, reshaped by codex-adversarial-review-2026-05-17 (ADR sweep pass) - Primary owning slice: M053/S11