singularity-forge/docs/specs/sf-self-deploy.md
Mikael Hugo 80d986c046
Some checks are pending
sf self-deploy / build, test, and publish server image (push) Waiting to run
sf self-deploy / upgrade vega source server (push) Blocked by required conditions
sf self-deploy / deploy test and probe (push) Blocked by required conditions
sf self-deploy / promote prod (push) Blocked by required conditions
ci: default sf-server image to Forgejo registry
2026-05-17 23:12:35 +02:00

5.1 KiB

SF Self-Deploy Contract

SF deploys as a long-running server owned by the deployment platform, not by an interactive TUI session. Forgejo is the build authority: it verifies a source revision, publishes an immutable OCI image, then rolls a test server before prod.

Purpose

The server must be reloadable without humans killing old processes by hand, and the CLI/web surfaces must be able to prove which build they are controlling. The artifact boundary is therefore:

  1. source revision in git
  2. Forgejo build/test result
  3. OCI image tag or digest
  4. dist/sf-release-manifest.json
  5. /api/healthz, /api/ready, and /api/version probes

Build Authority

Forgejo runs .forgejo/workflows/self-deploy.yml on main and manual dispatch. The required gates are:

  • npm ci
  • npm --prefix web ci
  • npm run build:core
  • npm run build:web-host
  • npm run typecheck:extensions
  • npm run test:unit
  • build docker/Dockerfile.sf-server
  • generate dist/sf-release-manifest.json

The image builder is Docker/BuildKit. The default Forgejo image repository is registry.infra.centralcloud.com/singularity/sf-server, matching the in-cluster registry host already used by GitOps workloads. The deployment contract starts at the OCI image plus release manifest.

Server Runtime

The server image starts:

node /opt/sf/dist/loader.js server /workspace --host 0.0.0.0 --port 4000

The web host receives SF_RELEASE_MANIFEST, SF_WEB_PROJECT_CWD, SF_WEB_HOST, and SF_WEB_PORT in its environment. Probes are unauthenticated so Kubernetes, Traefik, and Forgejo can verify rollouts without a browser token.

On vega, the local production server may run from the live checkout while still being containerised:

npm run docker:vega:up

That profile runs one shared SF webserver. It mounts this SF checkout at /opt/sf, mounts the initial controlled repo at /workspace, mounts the repo parent at /workspaces, also mounts the repo parent at its real host path (/home/mhugo/code on vega), persists ~/.sf, and binds port 4000 to ${SF_VEGA_BIND:-127.0.0.1}. SF_WORKSPACE_DIR selects the initial repo; it defaults to this checkout for dogfooding. SF_WORKSPACES_DIR selects the parent directory available for repo switching and defaults to the parent of this SF checkout:

SF_WORKSPACE_DIR=/home/mhugo/code/other-repo SF_WORKSPACES_DIR=/home/mhugo/code npm run docker:vega:up

Set SF_VEGA_BIND to the vega Tailscale address when the server should be reachable over Tailscale; do not bind public 0.0.0.0 unless a proxy/firewall owns access control.

On hosts without the Docker Compose plugin, npm run docker:vega:up uses scripts/run-vega-source-server.mjs to build docker/Dockerfile.source-server and run the equivalent docker run command directly. This is one SF server implementation, one shared webserver process, and repo-scoped worker/session state underneath it. Restarting the runner replaces the shared vega webserver, not one container per repo.

Use npm run docker:vega:upgrade for the local blue/green path. It builds the web host, writes the release manifest, starts sf-server-vega-candidate on port 4001, probes health/readiness/version/projects, replaces sf-server-vega on port 4000 only after the candidate passes, probes prod, then removes the candidate. Replacement drains the old container with docker stop -t ${SF_VEGA_DRAIN_STOP_TIME:-610} before forced removal fallback. The default leaves a 10 second margin over the RPC child's SF_RPC_SHUTDOWN_GRACE_MS=600000 queue-drain handler.

Forgejo can trigger this source-mounted path automatically after the build job. Set repository variable SF_VEGA_UPGRADE_URL to the private server base URL such as http://vega.ts.hugo.dk:4000. If the web server has auth enabled, set secret SF_VEGA_UPGRADE_TOKEN; the workflow sends it as a bearer token. The job posts /api/server-upgrade, then polls /api/ready until the live server reports the pushed GITHUB_SHA.

Promotion

Test must roll before prod:

  1. set test deployment image to the new digest
  2. wait for rollout
  3. call /api/healthz
  4. call /api/ready
  5. call /api/version
  6. promote the same image digest to prod
  7. repeat the same probes

Prod must not install latest from npm during rollout. Runtime auto-update means the deployment controller rolls a verified image; it does not mean the running process mutates its own package tree.

Reload Model

For a source-mounted vega container, the foreground process is the staged Next standalone server at dist/web/standalone/server.js. Rebuild or restart the container after changing server/web code. In Kubernetes or k3s, rollout replacement is the reload mechanism. Long term, CLI commands should call the server RPC surface by default when a healthy server owns the project, while local sf server remains the bootstrap and recovery path.

Open Work

  • Wire /api/version into the web footer/admin panel.
  • Add an RPC smoke probe once the stable server RPC endpoint is finalized.
  • Move the Forgejo workflow's deployment target names into /srv/infra GitOps values when the cluster manifests exist.