Bundles the working-tree state into one coherent commit covering the
upgrade-safety glue that complements today's earlier landings
(orphan-recovery, sf-db single-connection, drain-timer-not-unref'd,
forceShutdown drain, shutdown-state.ts, instrumentation.ts,
shutdown-signal.js, gate-deadlock-classifier).
Modified:
docker/Dockerfile.source-server — image build tweaks for the source-
server variant used by the in-container upgrader.
docker/docker-compose.vega.yaml — env passthroughs for host-side dirs
(SF_SOURCE_HOST_ROOT, SF_WORKSPACE_HOST_DIR, SF_WORKSPACES_HOST_DIR,
SF_HOME_HOST_DIR), docker socket mount, group_add for docker GID,
and SF_RPC_SHUTDOWN_GRACE_MS=600000 matching the 10-min drain.
scripts/run-vega-source-server.mjs — substantial rework supporting
the in-container upgrade flow.
scripts/upgrade-vega-source-server.mjs — buildEnv() + dockerBuildEnv()
helpers, probeBind via SF_VEGA_PROBE_HOST, containerExists()
pre-check before drainContainer, stop timeout now matches the
10-min RPC grace via SF_VEGA_DRAIN_STOP_TIME (default 610s).
src/web/project-discovery-service.ts — calls
recoverProjectRuntimeQueues() on each of the 3 discovery paths
(root monorepo, per-entry, nested SF projects). Closes the
cloud-volume mtime-lag window codex flagged.
web/app/api/ready/route.ts — calls recoverProjectRuntimeQueues() on
every readiness probe, and now also reads shutdown-state so the
probe returns 503 while draining.
web/components/sf/projects-view.tsx — UI wiring for the upgrade
trigger.
web/pages/api/projects.ts — backend API addition for the project
enumeration that feeds projects-view.
docs/specs/sf-self-deploy.md — docs update for the new flow.
package.json — script alias.
Added:
scripts/build-web-host.mjs — new build helper for the standalone web
host artifact consumed by the upgrade flow.
src/resources/extensions/sf/tests/auto-shutdown-signal.test.mjs —
unit test for the cooperative-shutdown signal module (registers /
requests / snapshot).
src/web/project-runtime-recovery.ts — thin wrapper around
recoverOrphanedFeedbackDrains for per-project use from web routes.
web/app/api/drain/route.ts — explicit drain endpoint for operator-
triggered queue flush.
web/app/api/server-upgrade/route.ts — auth-gated endpoint that
spawns the in-container upgrader via docker socket; passes through
host-dir env so the upgrader knows real bind-mount paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.5 KiB
SF Self-Deploy Contract
SF deploys as a long-running server owned by the deployment platform, not by an interactive TUI session. Forgejo is the build authority: it verifies a source revision, publishes an immutable OCI image, then rolls a test server before prod.
Purpose
The server must be reloadable without humans killing old processes by hand, and the CLI/web surfaces must be able to prove which build they are controlling. The artifact boundary is therefore:
- source revision in git
- Forgejo build/test result
- OCI image tag or digest
dist/sf-release-manifest.json/api/healthz,/api/ready, and/api/versionprobes
Build Authority
Forgejo runs .forgejo/workflows/self-deploy.yml on main and manual dispatch.
The required gates are:
npm cinpm --prefix web cinpm run build:corenpm run build:web-hostnpm run typecheck:extensionsnpm run test:unit- build
docker/Dockerfile.sf-server - generate
dist/sf-release-manifest.json
The image builder is Docker/BuildKit. The deployment contract starts at the OCI image plus release manifest.
Server Runtime
The server image starts:
node /opt/sf/dist/loader.js server /workspace --host 0.0.0.0 --port 4000
The web host receives SF_RELEASE_MANIFEST, SF_WEB_PROJECT_CWD,
SF_WEB_HOST, and SF_WEB_PORT in its environment. Probes are unauthenticated
so Kubernetes, Traefik, and Forgejo can verify rollouts without a browser token.
On vega, the local production server may run from the live checkout while still being containerised:
npm run docker:vega:up
That profile runs one shared SF webserver. It mounts this SF checkout at
/opt/sf, mounts the initial controlled repo at /workspace, mounts the repo
parent at /workspaces, also mounts the repo parent at its real host path
(/home/mhugo/code on vega), persists ~/.sf, and binds port 4000 to
${SF_VEGA_BIND:-127.0.0.1}. SF_WORKSPACE_DIR selects the initial repo; it
defaults to this checkout for dogfooding. SF_WORKSPACES_DIR selects the parent
directory available for repo switching and defaults to the parent of this SF
checkout:
SF_WORKSPACE_DIR=/home/mhugo/code/other-repo SF_WORKSPACES_DIR=/home/mhugo/code npm run docker:vega:up
Set SF_VEGA_BIND to the vega Tailscale address when the server should be
reachable over Tailscale; do not bind public 0.0.0.0 unless a proxy/firewall
owns access control.
On hosts without the Docker Compose plugin, npm run docker:vega:up uses
scripts/run-vega-source-server.mjs to build docker/Dockerfile.source-server
and run the equivalent docker run command directly. This is one SF server
implementation, one shared webserver process, and repo-scoped worker/session
state underneath it. Restarting the runner replaces the shared vega webserver,
not one container per repo.
Use npm run docker:vega:upgrade for the local blue/green path. It builds the
web host, writes the release manifest, starts sf-server-vega-candidate on
port 4001, probes health/readiness/version/projects, replaces sf-server-vega
on port 4000 only after the candidate passes, probes prod, then removes the
candidate. Replacement drains the old container with
docker stop --timeout ${SF_VEGA_DRAIN_STOP_TIME:-610} before forced removal
fallback. The default leaves a 10 second margin over the RPC child's
SF_RPC_SHUTDOWN_GRACE_MS=600000 queue-drain handler.
Promotion
Test must roll before prod:
- set test deployment image to the new digest
- wait for rollout
- call
/api/healthz - call
/api/ready - call
/api/version - promote the same image digest to prod
- repeat the same probes
Prod must not install latest from npm during rollout. Runtime auto-update
means the deployment controller rolls a verified image; it does not mean the
running process mutates its own package tree.
Reload Model
For a source-mounted vega container, the foreground process is the staged Next
standalone server at dist/web/standalone/server.js. Rebuild or restart the
container after changing server/web code. In Kubernetes or k3s, rollout
replacement is the reload mechanism. Long term, CLI commands should call the
server RPC surface by default when a healthy server owns the project, while
local sf server remains the bootstrap and recovery path.
Open Work
- Wire
/api/versioninto the web footer/admin panel. - Add an RPC smoke probe once the stable server RPC endpoint is finalized.
- Move the Forgejo workflow's deployment target names into
/srv/infraGitOps values when the cluster manifests exist.