feat: introduce repo-vcs skill and add JSDoc annotations across core modules

- Add repository-vcs-context.ts to detect and inject VCS context (Git/Jujutsu)
  into the agent system prompt; wire in repo-vcs bundled skill trigger
- Add src/resources/skills/repo-vcs/ skill for commit, push, and safe-push workflows
- Add JSDoc Purpose/Consumer annotations to app-paths, bundled-extension-paths,
  errors, extension-discovery, extension-registry, headless-types, headless, and traces
- Add justfile and just to flake.nix devShell
- Fill out new-user-onboarding.md spec (Draft) and core-beliefs.md (Status: Accepted)
- Add notification-event-model.md design doc and notification-source-hygiene.md spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-01 21:36:32 +02:00
parent 12e7333f1c
commit a611cd5792
18 changed files with 943 additions and 35 deletions

View file

@ -1,5 +1,9 @@
# Core Beliefs
Status: Accepted
- The repo should explain itself to humans and agents.
- Plans should carry acceptance criteria, falsifiers, and verification commands.
- Architecture should be mechanically checkable where possible.
- User intent should remain distinguishable from automated workflow state.
- Placeholder docs should say what is missing instead of pretending implementation exists.

View file

@ -0,0 +1,42 @@
# Notification Event Model
Status: Draft
## Context
Observed facts:
- The current working CLI/product is Singularity Forge.
- `singularity-foundry` is the old name/path.
- The active runtime and transcript/agent-loop implementation live in `/home/mhugo/code/singularity-forge`.
- The reported issue involves automated workflow/system notices appearing near or instead of user-authored input.
Inferred facts:
- The Forge/Foundry naming mismatch can make triage confusing, but the runtime risk is event-source confusion.
- A future runtime needs event metadata rather than text matching to separate user messages from notices.
The product needs a durable distinction between user-authored messages and automated workflow status. Without that boundary, repeated system notices can look like user input and can interrupt the work the user actually requested.
## Decision
Model inbound transcript events with an explicit source and blocking flag:
- `source`: `user`, `system`, `tool`, or `workflow`.
- `kind`: `message`, `notice`, `warning`, `error`, or `approval_request`.
- `blocking`: whether user action is required before work can continue.
- `dedupe_key`: stable key for grouping repeated non-blocking notices.
- `created_at`: event timestamp from the producing system.
Rendering and scheduling should prioritize `source=user` events over non-blocking notices. Blocking notices may interrupt, but they must still render as notices rather than user messages.
## Consequences
- Transcript rendering can label notices without relying on fragile text matching.
- Duplicate workflow notices can be grouped by `dedupe_key`.
- Tests can assert that user input remains primary when notices arrive nearby.
- Integrations must supply enough metadata to classify events correctly.
## Verification
When implemented, add unit tests for classification, deduplication, and transcript ordering.

View file

@ -1,3 +1,47 @@
# New User Onboarding
Describe the first-run experience, success criteria, and failure states when this product has an onboarding flow.
Status: Draft
## User Problem
New users need to understand that Singularity Forge is the current product, where decisions live, and how to make changes without breaking repo memory.
## Goals
- Orient a new user from the root docs in under a few minutes.
- Make the canonical homes for specs, design docs, plans, generated references, and records obvious.
- Explain what verification is expected before closing work.
## Non-Goals
- Do not define a marketing journey.
- Do not choose a runtime or frontend framework in this spec.
- Do not require a production onboarding flow before the product surface exists.
## First-Run Experience
Until a runtime exists, onboarding is documentation-based:
1. Start with root `AGENTS.md`.
2. Read `ARCHITECTURE.md` for current state, boundaries, and invariants.
3. Read `docs/PLANS.md` and `docs/exec-plans/active/` for current work.
4. Read `docs/QUALITY_SCORE.md`, `docs/RELIABILITY.md`, and `docs/SECURITY.md` before behavior changes.
5. Use `docs/RECORDS_KEEPER.md` after meaningful changes.
## Success Criteria
- The user can identify the canonical doc for a product decision, architecture decision, active plan, generated artifact, or records note.
- The user can name the command or eval expected to prove their change.
- The user can tell that active CLI source is in `/home/mhugo/code/singularity-forge`, while this `singularity-foundry` checkout is legacy context.
## Failure States
- A new user infers production behavior from placeholders.
- A source change happens without a spec, plan, test, or eval.
- A durable decision is left only in chat or a temporary note.
## Verification
```sh
find . -maxdepth 4 -type f -name '*.md' -print
```

View file

@ -0,0 +1,67 @@
# Notification Source Hygiene
Status: Draft
## User Problem
Users can lose trust when automated workflow or system notifications appear to be messages they typed, or when repeated notifications interrupt the latest real user request.
Observed facts:
- The current working CLI/product is Singularity Forge.
- `singularity-foundry` is the old name/path.
- The issue report used "SF" as shorthand for workflow notifications.
- The source implementation for notification and steering behavior lives in `/home/mhugo/code/singularity-forge`.
Inferred facts:
- The Forge/Foundry naming mismatch can make triage confusing, but it is not the runtime bug.
- The bug class is automated event source confusion: notices need to be handled separately from user-authored messages.
## Goals
- Separate user-authored messages from automated workflow, system, and tool notifications.
- Keep the latest real user request visible as the primary interaction.
- Group duplicate automated notices when they repeat without adding new actionable information.
- Preserve enough metadata to debug notification incidents.
## Non-Goals
- Do not suppress security, permission, or destructive-action warnings.
- Do not hide tool failures that block the requested work.
- Do not implement notification routing until the owning runtime or integration exists.
## Behavior
A future implementation should treat each inbound event as one of:
- `user_message`: content intentionally submitted by the user.
- `system_notice`: platform or workflow status that should be labeled and visually separated.
- `tool_notice`: command, tool, or automation status.
- `blocking_notice`: notice that requires user action before work can continue.
Repeated non-blocking notices with the same source and equivalent content should be collapsed into a single visible event with a count and time range. Blocking notices must remain visible, but they should still be labeled as notices rather than user input.
## Failure States
- Automated notifications are rendered as if the user typed them.
- Duplicate notices push the current user request out of view.
- A non-blocking notice prevents tools or responses from processing the real user message.
- A blocking notice is hidden or collapsed in a way that prevents informed consent.
## Acceptance Criteria
- The UI or transcript clearly distinguishes user messages from automated notices.
- Duplicate non-blocking notices are grouped without losing source and timestamp metadata.
- A regression test or product eval proves that automated notices do not supersede the latest real user message.
- Blocking notices still interrupt when user consent or safety requires it.
## Verification
To be replaced with concrete commands when the owning implementation exists:
```sh
# notification routing unit test
# transcript rendering regression test
# product eval for duplicate automated workflow notices
```

View file

@ -23,6 +23,7 @@
cargo
clippy
git
just
nodejs_24
pkg-config
nodePackages.typescript

53
justfile Normal file
View file

@ -0,0 +1,53 @@
set shell := ["bash", "-c"]
# List available tasks
default:
@just --list
# Install workspace dependencies
install:
npm install
# Full build (core + web)
build:
npm run build
# Build core runtime only (faster)
build-core:
npm run build:core
# Build native Rust addon (release)
build-native:
npm run build:native-pkg
# Run all tests
test:
npm test
# Run unit tests only
test-unit:
npm run test:unit
# Run smoke tests
test-smoke:
npm run test:smoke
# Run TypeScript type checking
typecheck:
npm run typecheck:extensions
# Lint
lint:
npm run lint
# Lint and auto-fix
lint-fix:
npm run lint:fix
# Remove build outputs
clean:
rm -rf dist dist-test
# Run SF CLI from source (usage: just sf <args>)
sf *args:
./bin/sf-from-source {{args}}

View file

@ -1,9 +1,87 @@
import { homedir } from "node:os";
import { join } from "node:path";
/**
* app-paths.ts central directory and file path constants for the sf runtime.
*
* Purpose: provide a single source of truth for all on-disk locations that sf
* reads and writes (agent bundles, sessions, auth, preferences, PID files).
* Centralising paths prevents drift when the layout changes and ensures tests
* can override SF_HOME to redirect all I/O into a temp directory.
*
* Consumer: cli.ts, loader.ts, onboarding.ts, extension-registry.ts,
* remote-questions-config.ts, update-check.ts, models-resolver.ts,
* project-sessions.ts, and cli-web-branch.ts.
*/
/**
* Returns the root directory for all sf runtime state.
*
* Purpose: define the top-level folder where sessions, agents, preferences,
* and PID files live. Overridable via SF_HOME so tests and CI can redirect
* all disk I/O away from the user's real home directory.
*
* Consumer: cli.ts (session manager, auth storage, resource loader),
* loader.ts (agent directory setup), update-check.ts (cache file),
* remote-questions-config.ts (global preferences), extension-registry.ts
* (registry JSON), and every other derived path in this module.
*/
export const appRoot = process.env.SF_HOME || join(homedir(), ".sf");
/**
* Returns the path to the managed agent directory.
*
* Purpose: isolate bundled extensions, agent binaries, and compiled assets
* from user data (sessions, preferences). This separation lets sf wipe and
* re-sync the agent directory on upgrade without touching session history.
*
* Consumer: cli.ts (resource loader, settings manager, auth storage),
* loader.ts (extension discovery and sync), onboarding.ts (first-run setup),
* models-resolver.ts (model registry), and cli-web-branch.ts (web mode).
*/
export const agentDir = join(appRoot, "agent");
/**
* Returns the path to the sessions directory.
*
* Purpose: keep all conversation history, checkpoints, and session metadata
* in one tree so the SessionManager can enumerate, archive, or migrate them.
*
* Consumer: cli.ts (SessionManager base directory), project-sessions.ts
* (project-scoped session subdirectories), and cli-web-branch.ts (web mode
* session root).
*/
export const sessionsDir = join(appRoot, "sessions");
/**
* Returns the path to the auth credentials file.
*
* Purpose: store API keys and tokens securely on disk so users do not have
* to re-authenticate on every launch. The file is read by AuthStorage and
* written during onboarding or login flows.
*
* Consumer: cli.ts (AuthStorage.create), cli-web-branch.ts (web-mode auth).
*/
export const authFilePath = join(agentDir, "auth.json");
/**
* Returns the path to the web-server PID file.
*
* Purpose: track whether a background web server is already running so that
* concurrent `sf web` invocations can detect the existing process instead of
* spawning a second listener on the same port.
*
* Consumer: cli-web-branch.ts (web mode start/stop lifecycle).
*/
export const webPidFilePath = join(appRoot, "web-server.pid");
/**
* Returns the path to the web-preferences JSON file.
*
* Purpose: persist UI-specific settings (theme, layout, last-opened project)
* separately from CLI preferences so the web frontend and the TUI can evolve
* their configs independently.
*
* Consumer: cli-web-branch.ts (web mode settings read/write).
*/
export const webPreferencesPath = join(appRoot, "web-preferences.json");

View file

@ -1,5 +1,27 @@
import { delimiter } from "node:path";
/**
* bundled-extension-paths.ts serialize and deserialize bundled extension
* entry-point paths for the SF_BUNDLED_EXTENSION_PATHS environment variable.
*
* Purpose: encode a list of absolute file paths into a single string that can
* be passed through process.env, then decode it back into an array. This lets
* loader.ts communicate discovered extension entry points to downstream code
* without relying on mutable global state or repeated filesystem scans.
*
* Consumer: loader.ts (sets SF_BUNDLED_EXTENSION_PATHS after discovery),
* tests/bundled-extension-paths.test.ts (behaviour contracts).
*/
/**
* Serialises an array of extension entry-point paths into a delimited string.
*
* Purpose: produce a compact, environment-variable-safe representation of the
* bundled extensions that loader.ts discovered. Using the platform path
* delimiter by default means the encoded string is also a valid PATH-like value.
*
* Consumer: loader.ts (assigns the result to process.env.SF_BUNDLED_EXTENSION_PATHS).
*/
export function serializeBundledExtensionPaths(
paths: readonly string[],
pathDelimiter = delimiter,
@ -7,6 +29,16 @@ export function serializeBundledExtensionPaths(
return paths.filter(Boolean).join(pathDelimiter);
}
/**
* Parses a delimited string of extension entry-point paths back into an array.
*
* Purpose: recover the original path list from the SF_BUNDLED_EXTENSION_PATHS
* environment variable. Trimming and filtering empty segments makes the parser
* tolerant of trailing delimiters or extra whitespace.
*
* Consumer: downstream code that reads process.env.SF_BUNDLED_EXTENSION_PATHS
* to know which bundled extensions are active (e.g. extension loading logic).
*/
export function parseBundledExtensionPaths(
value: string | undefined,
pathDelimiter = delimiter,

View file

@ -18,8 +18,13 @@
/**
* A user-facing or machine-readable error record with rich context.
*
* All fields are optional except `message` so that call-sites can incrementally
* adopt structured errors without rewriting every catch block at once.
* Purpose: provide a single, serializable shape that every error path can
* enrich incrementally (operation, file, guidance, retry hint) so that
* consumers stderr printers, JSON exporters, and trace emitters receive
* enough context to produce actionable output without guessing.
*
* Consumer: cli.ts catch blocks, headless.ts event handlers, trace span
* error events, and any extension that surfaces user-facing failures.
*/
export interface StructuredError {
/** Human-readable description of what went wrong. */
@ -80,6 +85,11 @@ export function error(
* File: <file>:<line>
* Guidance: <guidance>
* Retryable: yes|no
*
* Purpose: give users a consistent, scannable error layout in the terminal
* so they can spot the file, guidance, and retry hint without parsing JSON.
*
* Consumer: cli.ts before writing to process.stderr.
*/
export function formatStructuredError(
err: StructuredError,
@ -109,6 +119,8 @@ export function formatStructuredError(
*
* Purpose: headless --output-format json mode can embed structured errors
* in the result payload instead of interleaving free-form text on stderr.
*
* Consumer: headless.ts when emitting batch JSON results.
*/
export function errorToJson(err: StructuredError): Record<string, unknown> {
const out: Record<string, unknown> = { message: err.message };
@ -135,6 +147,9 @@ export function errorToJson(err: StructuredError): Record<string, unknown> {
*
* Purpose: safe type guards at catch boundaries where the thrown value may
* be a plain Error, a StructuredError, or something else entirely.
*
* Consumer: headless.ts before deciding whether to call formatStructuredError
* or fall back to String(err).
*/
export function isStructuredError(val: unknown): val is StructuredError {
return (

View file

@ -1,3 +1,18 @@
/**
* Extension Discovery resolves extension entry-point files from a directory tree.
*
* Supports two discovery modes:
* 1. package.json with a `pi` manifest (authoritative, allows opt-out).
* 2. Fallback to index.ts / index.js when no manifest is present.
*
* Purpose: decouple the physical layout of extensions on disk from the loader so
* that extensions can declare their own entry points and library directories can
* opt out of being loaded.
*
* Consumer: extension-registry.ts (ensureRegistryEntries), the sf-run loader, and
* the test suite that validates symlink and manifest edge cases.
*/
import { existsSync, readdirSync, readFileSync } from "node:fs";
import { join, resolve } from "node:path";
@ -16,6 +31,13 @@ function isExtensionFile(name: string): boolean {
* - `pi.extensions` array resolve each entry relative to the directory.
* - `pi: {}` (no extensions) return empty (library opt-out, e.g. cmux).
* 2. Only when no `pi` manifest exists does it fall back to `index.ts` `index.js`.
*
* Purpose: give extension authors explicit control over what gets loaded while
* preserving backwards compatibility for simple extensions that only provide an
* index file.
*
* Consumer: discoverExtensionEntryPaths() and tests that verify manifest vs fallback
* resolution logic.
*/
export function resolveExtensionEntries(dir: string): string[] {
const packageJsonPath = join(dir, "package.json");
@ -61,6 +83,13 @@ export function resolveExtensionEntries(dir: string): string[] {
* - Top-level .ts/.js files are treated as standalone extension entry points.
* - Subdirectories are resolved via `resolveExtensionEntries()` (package.json
* pi.extensions, then index.ts/index.js fallback).
*
* Purpose: produce a flat list of absolute entry-point paths that the loader can
* require() in order, regardless of whether extensions are organised as files or
* directories.
*
* Consumer: the sf-run loader bootstrap and integration tests that verify discovery
* against fixture directories.
*/
export function discoverExtensionEntryPaths(extensionsDir: string): string[] {
if (!existsSync(extensionsDir)) {

View file

@ -4,6 +4,12 @@
* Extensions without manifests always load (backwards compatible).
* A fresh install has an empty registry all extensions enabled by default.
* The only way an extension stops loading is an explicit `sf extensions disable <id>`.
*
* Purpose: provide a single source of truth for which extensions are active so that
* the loader can decide what to load and the CLI can show the user what is installed.
*
* Consumer: extension-discovery.ts (reads manifests), loader.ts (decides what to load),
* and commands-handlers.ts (implements `sf extensions list/enable/disable`).
*/
import {
@ -19,6 +25,14 @@ import { appRoot } from "./app-paths.js";
// ─── Types ──────────────────────────────────────────────────────────────────
/**
* Describes the static metadata shipped with an extension.
*
* Purpose: let the registry and loader validate an extension before loading it
* and present human-readable information in CLI listings.
*
* Consumer: readManifest(), discoverAllManifests(), and the `sf extensions list` command.
*/
export interface ExtensionManifest {
id: string;
name: string;
@ -38,6 +52,14 @@ export interface ExtensionManifest {
};
}
/**
* A single entry in the on-disk registry file.
*
* Purpose: persist whether an extension is enabled, why it was disabled, and
* where it came from so that decisions survive process restarts.
*
* Consumer: loadRegistry(), saveRegistry(), and the enable/disable mutations.
*/
export interface ExtensionRegistryEntry {
id: string;
enabled: boolean;
@ -46,6 +68,13 @@ export interface ExtensionRegistryEntry {
disabledReason?: string;
}
/**
* The top-level shape of the persisted registry file.
*
* Purpose: version the JSON schema so future migrations can detect old formats.
*
* Consumer: loadRegistry() and saveRegistry().
*/
export interface ExtensionRegistry {
version: 1;
entries: Record<string, ExtensionRegistryEntry>;
@ -74,6 +103,15 @@ function isManifest(data: unknown): data is ExtensionManifest {
// ─── Registry Path ──────────────────────────────────────────────────────────
/**
* Returns the absolute path to the persisted registry JSON file.
*
* Purpose: centralise the registry location so every I/O operation targets the
* same file and the path can be overridden in tests.
*
* Consumer: loadRegistry(), saveRegistry(), and test fixtures that need to
* point at a temporary registry.
*/
export function getRegistryPath(): string {
return join(appRoot, "extensions", "registry.json");
}
@ -84,6 +122,16 @@ function defaultRegistry(): ExtensionRegistry {
return { version: 1, entries: {} };
}
/**
* Reads the registry from disk, returning a default empty registry if the file
* is missing, malformed, or unreadable.
*
* Purpose: guarantee that every caller receives a valid ExtensionRegistry object
* without having to handle I/O edge cases themselves.
*
* Consumer: ensureRegistryEntries(), the extension loader, and CLI commands that
* need to inspect or mutate extension state.
*/
export function loadRegistry(): ExtensionRegistry {
const filePath = getRegistryPath();
try {
@ -96,6 +144,15 @@ export function loadRegistry(): ExtensionRegistry {
}
}
/**
* Atomically writes the registry to disk using a temp-file + rename pattern.
*
* Purpose: prevent corrupt or partial registry files if the process crashes
* mid-write, and silently swallow non-fatal persistence errors so the CLI
* remains usable even when the filesystem is read-only.
*
* Consumer: enableExtension(), disableExtension(), and ensureRegistryEntries().
*/
export function saveRegistry(registry: ExtensionRegistry): void {
const filePath = getRegistryPath();
try {
@ -110,7 +167,14 @@ export function saveRegistry(registry: ExtensionRegistry): void {
// ─── Query ──────────────────────────────────────────────────────────────────
/** Returns true if the extension is enabled (missing entries default to enabled). */
/**
* Returns true if the extension is enabled (missing entries default to enabled).
*
* Purpose: let the loader decide whether to activate an extension without
* requiring every caller to know the "missing means enabled" default.
*
* Consumer: the extension loader and `sf extensions list` when rendering status.
*/
export function isExtensionEnabled(
registry: ExtensionRegistry,
id: string,
@ -122,6 +186,14 @@ export function isExtensionEnabled(
// ─── Mutations ──────────────────────────────────────────────────────────────
/**
* Marks an extension as enabled, clearing any previous disable metadata.
*
* Purpose: provide the atomic state transition used by `sf extensions enable`
* and by the auto-discovery flow when a new extension is first seen.
*
* Consumer: `sf extensions enable` command handler and ensureRegistryEntries().
*/
export function enableExtension(registry: ExtensionRegistry, id: string): void {
const entry = registry.entries[id];
if (entry) {
@ -136,6 +208,11 @@ export function enableExtension(registry: ExtensionRegistry, id: string): void {
/**
* Disable an extension. Returns an error string if the extension is core (cannot disable),
* or null on success.
*
* Purpose: protect core extensions from accidental disablement while allowing users
* to turn off bundled or community extensions and recording why.
*
* Consumer: `sf extensions disable` command handler.
*/
export function disableExtension(
registry: ExtensionRegistry,
@ -165,7 +242,15 @@ export function disableExtension(
// ─── Manifest Reading ───────────────────────────────────────────────────────
/** Read extension-manifest.json from a directory. Returns null if missing or invalid. */
/**
* Read extension-manifest.json from a directory. Returns null if missing or invalid.
*
* Purpose: isolate manifest parsing and validation so callers receive either a
* fully typed ExtensionManifest or a clear null signal.
*
* Consumer: discoverAllManifests(), readManifestFromEntryPath(), and tests that
* verify manifest schema evolution.
*/
export function readManifest(extensionDir: string): ExtensionManifest | null {
const manifestPath = join(extensionDir, "extension-manifest.json");
if (!existsSync(manifestPath)) return null;
@ -180,6 +265,12 @@ export function readManifest(extensionDir: string): ExtensionManifest | null {
/**
* Given an entry path (e.g. `.../extensions/browser-tools/index.ts`),
* resolve the parent directory and read its manifest.
*
* Purpose: bridge the gap between a discovered entry-point file and its
* containing extension's metadata, used when the loader needs tier or
* dependency information for a specific file.
*
* Consumer: extension loader when validating an entry point before require()ing it.
*/
export function readManifestFromEntryPath(
entryPath: string,
@ -190,7 +281,14 @@ export function readManifestFromEntryPath(
// ─── Discovery ──────────────────────────────────────────────────────────────
/** Scan all subdirectories of extensionsDir for manifests. Returns a Map<id, manifest>. */
/**
* Scan all subdirectories of extensionsDir for manifests. Returns a Map<id, manifest>.
*
* Purpose: produce a complete, de-duplicated inventory of installed extensions
* so the registry can be reconciled against the filesystem.
*
* Consumer: ensureRegistryEntries() and the extension loader's bootstrap phase.
*/
export function discoverAllManifests(
extensionsDir: string,
): Map<string, ExtensionManifest> {
@ -214,6 +312,11 @@ export function discoverAllManifests(
/**
* Auto-populate registry entries for newly discovered extensions.
* Extensions already in the registry are left untouched.
*
* Purpose: keep the registry in sync with the filesystem after installs or
* updates without overwriting user preferences (e.g. disabled state).
*
* Consumer: sf-run startup sequence and `sf extensions sync` command.
*/
export function ensureRegistryEntries(extensionsDir: string): void {
const manifests = discoverAllManifests(extensionsDir);

View file

@ -1,16 +1,36 @@
/**
* Headless Types shared types for the headless orchestrator surface.
* headless-types.ts shared types for the headless orchestrator surface.
*
* Contains the structured result type emitted in --output-format json mode
* and the output format discriminator.
* Purpose: provide a single source of truth for the structured result type
* emitted in --output-format json mode and the output format discriminator,
* so headless.ts, consumers, and tests agree on shape without circular deps.
*
* Consumer: headless.ts (orchestrator), external CI scripts parsing batch JSON.
*/
// ---------------------------------------------------------------------------
// Output Format
// ---------------------------------------------------------------------------
/**
* Discriminates the three headless output modes.
*
* Purpose: let callers declare how they want to receive session results
* (human-readable text, single JSON blob, or streaming JSONL) so the
* orchestrator can select the right serializer upfront.
*
* Consumer: parseHeadlessArgs in headless.ts when handling --output-format.
*/
export type OutputFormat = "text" | "json" | "stream-json";
/**
* Set of supported output-format string values.
*
* Purpose: guard against typos in CLI arguments and provide a fast
* membership test without repeating the literal list.
*
* Consumer: parseHeadlessArgs validation and unit tests.
*/
export const VALID_OUTPUT_FORMATS: ReadonlySet<string> = new Set([
"text",
"json",
@ -21,6 +41,16 @@ export const VALID_OUTPUT_FORMATS: ReadonlySet<string> = new Set([
// Structured JSON Result
// ---------------------------------------------------------------------------
/**
* Shape of the single JSON object written to stdout when --output-format json
* is used in batch (non-streaming) mode.
*
* Purpose: give non-interactive callers (CI pipelines, test harnesses,
* parent processes) a machine-readable contract for session outcome,
* cost, and metadata without scraping stderr.
*
* Consumer: emitBatchJsonResult in headless.ts; live-regression tests.
*/
export interface HeadlessJsonResult {
schemaVersion: 1;
status: "success" | "error" | "blocked" | "cancelled" | "timeout";

View file

@ -1,16 +1,13 @@
/**
* Headless Orchestrator `sf headless`
* headless.ts Headless Orchestrator for `sf headless`.
*
* Runs any /sf subcommand without a TUI by spawning a child process in
* RPC mode, auto-responding to extension UI requests, and streaming
* progress to stderr.
* Purpose: run any /sf subcommand without a TUI by spawning a child process in
* RPC mode, auto-responding to extension UI requests, and streaming progress to
* stderr. This lets CI pipelines, test harnesses, and remote orchestrators drive
* sf-run programmatically.
*
* Exit codes:
* 0 complete (command finished successfully)
* 1 error or timeout
* 10 blocked (command reported a blocker)
* 11 cancelled (SIGINT/SIGTERM received)
* 12 reload (agent requested restart-with-resume, same session)
* Consumer: CLI entry point (commands-handlers.ts) when the user runs
* `sf headless <subcommand>`.
*/
import type { ChildProcess } from "node:child_process";
@ -93,6 +90,15 @@ const HEADLESS_HEARTBEAT_INTERVAL_MS = 60_000;
// Types
// ---------------------------------------------------------------------------
/**
* Parsed CLI options for the headless orchestrator.
*
* Purpose: collect every flag and positional argument that influences how the
* headless session runs (timeouts, output format, model, resume, supervision,
* etc.) into a single typed bag so downstream logic doesn't re-parse argv.
*
* Consumer: parseHeadlessArgs and runHeadless in this module.
*/
export interface HeadlessOptions {
timeout: number;
json: boolean;
@ -113,6 +119,16 @@ export interface HeadlessOptions {
bare?: boolean; // --bare: suppress CLAUDE.md/AGENTS.md, user skills, project preferences
}
/**
* Ensure the local .sf directory exists by creating a symlink to external
* project state when the directory is missing.
*
* Purpose: let headless sessions recover when .sf/ is absent but an external
* project state directory exists (e.g. after cloning or cache eviction),
* avoiding a hard failure on every command that expects local state.
*
* Consumer: runHeadlessOnce during project-state validation.
*/
export function repairMissingSfSymlinkForHeadless(
basePath: string,
): string | null {
@ -139,6 +155,16 @@ interface HeadlessUnitNotification {
verdict?: string;
}
/**
* Parse a unit start/end notification line into a structured object.
*
* Purpose: turn free-form stderr notify lines like `[unit] slice M001/S01 starting`
* into typed data so the trace collector and progress observers can react without
* brittle string matching scattered across the file.
*
* Consumer: handleUnitStart, handleUnitEnd, and observeHeadlessNotification in
* this module.
*/
export function parseHeadlessUnitNotification(
message: string,
): HeadlessUnitNotification | null {
@ -168,6 +194,15 @@ export function parseHeadlessUnitNotification(
// Resume Session Resolution
// ---------------------------------------------------------------------------
/**
* Result of resolving a session prefix to a concrete session.
*
* Purpose: represent the two possible outcomes of prefix lookup a unique
* matched session or an error string so callers can branch cleanly without
* throwing.
*
* Consumer: resolveResumeSession and the --resume flow in runHeadlessOnce.
*/
export interface ResumeSessionResult {
session?: SessionInfo;
error?: string;
@ -177,6 +212,11 @@ export interface ResumeSessionResult {
* Resolve a session prefix to a single session.
* Exact id match is preferred over prefix match.
* Returns `{ session }` on unique match or `{ error }` on 0/ambiguous matches.
*
* Purpose: let users resume sessions with short prefixes (e.g. `--resume abc`)
* while preventing accidental ambiguity when two IDs share a prefix.
*
* Consumer: runHeadlessOnce when processing the `--resume <prefix>` CLI flag.
*/
export function resolveResumeSession(
sessions: SessionInfo[],
@ -208,6 +248,14 @@ export function resolveResumeSession(
// CLI Argument Parser
// ---------------------------------------------------------------------------
/**
* Parse the process.argv array into structured HeadlessOptions.
*
* Purpose: centralise all CLI flag parsing for `sf headless` so the rest of
* the orchestrator works with a typed options object instead of raw strings.
*
* Consumer: CLI entry point before invoking runHeadless.
*/
export function parseHeadlessArgs(argv: string[]): HeadlessOptions {
const options: HeadlessOptions = {
timeout: 300_000,
@ -319,6 +367,16 @@ const RELOAD_SENTINEL = join(process.env.TEMP ?? "/tmp", "sf-reload-sentinel");
// Main Orchestrator
// ---------------------------------------------------------------------------
/**
* Run a headless session with automatic restart on crash and reload on
* agent-requested resume.
*
* Purpose: provide a resilient outer loop around a single headless session so
* transient RPC failures or agent-triggered restarts don't break CI pipelines
* or long-running autonomous workflows.
*
* Consumer: CLI entry point after parseHeadlessArgs.
*/
export async function runHeadless(options: HeadlessOptions): Promise<void> {
const stdoutWithHandle = process.stdout as typeof process.stdout & {
_handle?: { setBlocking?: (blocking: boolean) => void };

View file

@ -45,6 +45,7 @@ import {
import { resolveModelWithFallbacksForUnit } from "../preferences-models.js";
import { resolveSkillReference } from "../preferences-skills.js";
import { getTemplatesDir, loadPrompt } from "../prompt-loader.js";
import { buildRepositoryVcsContextBlock } from "../repository-vcs-context.js";
import {
detectNewSkills,
formatSkillsXml,
@ -79,6 +80,11 @@ const BUNDLED_SKILL_TRIGGERS: Array<{ trigger: string; skill: string }> = [
"Debugging - complex bugs, failing tests, root-cause investigation after standard approaches fail",
skill: "debug-like-expert",
},
{
trigger:
"Repository VCS operations - commit, push, safe-push, Git vs JJ, repo-local version-control rules",
skill: "repo-vcs",
},
];
function buildBundledSkillsTable(): string {
@ -265,6 +271,7 @@ export async function buildBeforeAgentStartResult(
: null;
const worktreeBlock = buildWorktreeContextBlock();
const repositoryVcsBlock = buildRepositoryVcsContextBlock(process.cwd());
const modelIdentityBlock = ctx.model
? `\n\n## Active Model Identity\n\nCurrent executor model: ${formatModelIdentity(ctx.model)}. Treat the model name as the capability identity and the provider/model route as the wire ID. Do not substitute one Kimi version for another.`
: "";
@ -274,7 +281,7 @@ export async function buildBeforeAgentStartResult(
? `\n\n## Subagent Model\n\nWhen spawning subagents via the \`subagent\` tool, always pass \`model: "${subagentModelConfig.primary}"\` in the tool call parameters. Never omit this — always specify it explicitly.`
: "";
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${modelIdentityBlock}${subagentModelBlock}`;
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;
stopContextTimer({
systemPromptSize: fullSystem.length,

View file

@ -0,0 +1,111 @@
import { existsSync } from "node:fs";
import { dirname, join, relative } from "node:path";
export type RepositoryVcsKind = "jj" | "git" | "none";
export interface RepositoryVcsContext {
kind: RepositoryVcsKind;
root: string | null;
pushWrapper: string | null;
}
const PUSH_WRAPPER_CANDIDATES = [
join("scripts", "ace_safe_push.sh"),
join("scripts", "safe_push.sh"),
join("scripts", "safe-push.sh"),
"justfile",
] as const;
/**
* Detect the version-control system for the current repository.
*
* Purpose: keep repo-specific VCS policy local to the current checkout so a JJ
* rule from one project is never applied to an unrelated Git repository.
*
* Consumer: system-context.ts when injecting durable SF operating guidance.
*/
export function detectRepositoryVcsContext(
startDir: string,
): RepositoryVcsContext {
let current = startDir;
while (true) {
const jjDir = join(current, ".jj");
const gitMarker = join(current, ".git");
if (existsSync(jjDir)) {
return {
kind: "jj",
root: current,
pushWrapper: findRepoPushWrapper(current),
};
}
if (existsSync(gitMarker)) {
return {
kind: "git",
root: current,
pushWrapper: findRepoPushWrapper(current),
};
}
const parent = dirname(current);
if (parent === current) {
return { kind: "none", root: null, pushWrapper: null };
}
current = parent;
}
}
/**
* Format repo-local VCS guidance for the system prompt.
*
* Purpose: make push/commit behavior conditional on the current repository
* instead of global agent memory or nearby checkout conventions.
*
* Consumer: system-context.ts before agent start.
*/
export function buildRepositoryVcsContextBlock(startDir: string): string {
const context = detectRepositoryVcsContext(startDir);
if (context.kind === "none" || !context.root) return "";
const lines = [
"",
"",
"[REPOSITORY VCS CONTEXT]",
`Repository root: ${context.root}`,
`Detected VCS: ${context.kind === "jj" ? "Jujutsu (JJ)" : "Git"}`,
"",
"Repo-local rules:",
"- Detect the current repository before choosing commit or push commands.",
"- Follow this repository's own instructions, hooks, and wrapper scripts.",
"- Do not apply push rules from sibling or unrelated repositories.",
"- Treat JJ as a repo-specific skill: use it only when this repository is JJ-backed.",
];
if (context.pushWrapper) {
lines.push(
`- Repo-owned push wrapper detected: ${relative(
context.root,
context.pushWrapper,
)}`,
"- Prefer the repo-owned wrapper for pushes unless the user explicitly says otherwise.",
);
} else if (context.kind === "git") {
lines.push(
"- No repo-owned push wrapper was detected; normal Git commands are appropriate for this repository.",
);
} else {
lines.push(
"- No repo-owned JJ push wrapper was detected; inspect repo docs before pushing.",
);
}
return lines.join("\n");
}
function findRepoPushWrapper(root: string): string | null {
for (const candidate of PUSH_WRAPPER_CANDIDATES) {
const path = join(root, candidate);
if (existsSync(path)) return path;
}
return null;
}

View file

@ -0,0 +1,59 @@
import assert from "node:assert/strict";
import { mkdirSync, mkdtempSync, writeFileSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import test from "node:test";
import {
buildRepositoryVcsContextBlock,
detectRepositoryVcsContext,
} from "../repository-vcs-context.ts";
function tempRepo(): string {
return mkdtempSync(join(tmpdir(), "sf-vcs-context-"));
}
test("detectRepositoryVcsContext_when_plain_git_repo_uses_git", () => {
const root = tempRepo();
mkdirSync(join(root, ".git"));
const context = detectRepositoryVcsContext(join(root, "src"));
assert.equal(context.kind, "git");
assert.equal(context.root, root);
assert.equal(context.pushWrapper, null);
});
test("detectRepositoryVcsContext_when_jj_marker_exists_prefers_jj", () => {
const root = tempRepo();
mkdirSync(join(root, ".git"));
mkdirSync(join(root, ".jj"));
const context = detectRepositoryVcsContext(root);
assert.equal(context.kind, "jj");
assert.equal(context.root, root);
});
test("buildRepositoryVcsContextBlock_when_git_without_wrapper_allows_git", () => {
const root = tempRepo();
mkdirSync(join(root, ".git"));
const block = buildRepositoryVcsContextBlock(root);
assert.match(block, /Detected VCS: Git/);
assert.match(block, /normal Git commands are appropriate/);
assert.doesNotMatch(block, /ACE/);
});
test("buildRepositoryVcsContextBlock_when_repo_wrapper_exists_keeps_it_local", () => {
const root = tempRepo();
mkdirSync(join(root, ".git"));
mkdirSync(join(root, "scripts"));
writeFileSync(join(root, "scripts", "safe_push.sh"), "#!/bin/sh\n");
const block = buildRepositoryVcsContextBlock(root);
assert.match(block, /Repo-owned push wrapper detected: scripts\/safe_push\.sh/);
assert.match(block, /Do not apply push rules from sibling/);
});

View file

@ -0,0 +1,27 @@
---
name: repo-vcs
description: Use when working with repository version-control operations such as commit, push, safe-push, Git vs JJ, worktree sync, or repo-local VCS rules. Detect the current repo first; use JJ only for repos that are actually JJ-backed.
---
# Repository VCS
Use this skill before committing, pushing, or interpreting repository-specific VCS instructions.
## Rules
- Detect the current repository first. Do not infer VCS behavior from sibling repos.
- If the current repo has `.jj/`, treat it as JJ-backed and follow that repo's JJ instructions.
- If the current repo has `.git/` and no `.jj/`, treat it as plain Git.
- If the current repo has a repo-owned push wrapper, use that wrapper for pushes unless the user explicitly directs otherwise.
- Never apply another repo's wrapper or safe-push rule to the current repo.
## Push Selection
- JJ repo with repo wrapper: use the repo wrapper.
- JJ repo without wrapper: inspect repo docs before pushing.
- Git repo with repo wrapper: use the repo wrapper.
- Git repo without wrapper: normal Git push is appropriate.
## Safety
Before mutating VCS state, check `git status --short --branch` or the repo's JJ equivalent. Preserve unrelated user changes.

View file

@ -1,12 +1,13 @@
/**
* Structured Trace Data Model & Export
* traces.ts Structured trace data model and export utilities for auto-mode execution.
*
* Provides a hierarchical span model for tracing auto-mode execution.
* Spans form a tree: root session span unit spans (milestone/slice/task) tool spans.
* Purpose: provide a lightweight, hierarchical span model that captures the
* full lifecycle of an auto-mode session (session units tools) so that
* post-hoc analysis, debugging, and cost attribution can be done from a
* single JSON artifact instead of piecing together scattered logs.
*
* Two export modes:
* exportTrace(path) write to arbitrary path
* exportTraceToProject(dir) write to .sf/traces/ in project dir
* Consumer: headless.ts (creates and finalizes traces), trace-collector.ts
* (appends spans and events), and any external tool that reads .sf/traces/.
*/
import { randomUUID } from "node:crypto";
@ -17,7 +18,28 @@ import { join } from "node:path";
// Types
// ---------------------------------------------------------------------------
/**
* Classify the role of a span in the trace hierarchy.
*
* Purpose: distinguish session roots, milestone/slice/task units, and
* individual tool calls so that renderers and aggregators can group or
* filter spans by semantic category.
*
* Consumer: trace-collector.ts when creating spans, and trace visualizers
* that colour-code or collapse spans by kind.
*/
export type SpanKind = "session" | "unit" | "tool";
/**
* Terminal state of a span.
*
* Purpose: capture whether a span finished successfully, failed, was
* cancelled, or is still running so that trace consumers can compute
* success rates and identify hung operations.
*
* Consumer: trace-collector.ts on unit/tool end, and trace analysis scripts
* that aggregate outcomes across sessions.
*/
export type SpanStatus =
| "ok"
| "error"
@ -25,12 +47,32 @@ export type SpanStatus =
| "timeout"
| "in_progress";
/**
* A discrete event attached to a span, such as a checkpoint or decision.
*
* Purpose: record semantically meaningful moments (e.g. "planning meeting
* started", "model switched") inside a span without creating a child span
* for every micro-step.
*
* Consumer: trace-collector.ts when recording model switches, gate results,
* or other non-span lifecycle events.
*/
export interface TraceEvent {
name: string;
timestamp: number;
attributes?: Record<string, string | number | boolean | null>;
}
/**
* Optional metadata attached to a span.
*
* Purpose: carry dimensional data (tokens, cost, model, file paths) that
* lets downstream tools attribute spend and latency to specific units or
* tools without parsing free-form log lines.
*
* Consumer: trace-collector.ts when enriching spans after LLM responses,
* and cost-dashboard scripts that sum inputTokens / outputTokens.
*/
export interface SpanAttributes {
// Session-level
projectRoot?: string;
@ -59,6 +101,15 @@ export interface SpanAttributes {
toolDurationMs?: number;
}
/**
* A single node in the trace tree.
*
* Purpose: represent one scoped operation (session, unit, or tool call) with
* timing, status, attributes, nested children, and a timeline of events so
* that the full execution graph can be reconstructed from the trace file.
*
* Consumer: trace-collector.ts, headless.ts, and any trace reader/visualizer.
*/
export interface Span {
id: string;
name: string;
@ -71,6 +122,15 @@ export interface Span {
events: TraceEvent[];
}
/**
* The top-level trace container.
*
* Purpose: hold the root span and session metadata so that a single file
* contains everything needed to replay or analyse an auto-mode session.
*
* Consumer: headless.ts (creates and finalizes), exportTrace/exportTraceToProject
* (serializes), and external trace consumers.
*/
export interface Trace {
id: string;
version: number;
@ -85,7 +145,14 @@ export interface Trace {
// Span helpers
// ---------------------------------------------------------------------------
/** Create a new span with a random UUID and current timestamp. */
/**
* Create a new span with a random UUID and current timestamp.
*
* Purpose: provide a single, correct construction site for spans so that
* every span has a stable ID and a consistent start-time baseline.
*
* Consumer: trace-collector.ts when starting a session, unit, or tool span.
*/
export function createSpan(
name: string,
kind: SpanKind,
@ -103,14 +170,30 @@ export function createSpan(
};
}
/** Mark a span as complete and record end time. */
/**
* Mark a span as complete and record end time.
*
* Purpose: ensure every finished span carries both a terminal status and an
* end timestamp so that duration calculations and success-rate metrics are
* accurate.
*
* Consumer: trace-collector.ts when a unit or tool finishes.
*/
export function endSpan(span: Span, status: SpanStatus = "ok"): Span {
span.status = status;
span.endTime = Date.now();
return span;
}
/** Append a named event to a span with optional attributes. */
/**
* Append a named event to a span with optional attributes.
*
* Purpose: let collectors record semantically rich checkpoints (model
* switches, gate completions) inside an existing span without mutating the
* span's own fields.
*
* Consumer: trace-collector.ts during auto-mode phase transitions.
*/
export function addEvent(
span: Span,
name: string,
@ -123,7 +206,15 @@ export function addEvent(
});
}
/** Append an error event to a span with message and optional stack. */
/**
* Append an error event to a span with message and optional stack.
*
* Purpose: capture failure details (including stack traces when available)
* inside the trace so that debugging can be done from the trace file alone
* without cross-referencing separate log files.
*
* Consumer: trace-collector.ts when a tool call or unit throws.
*/
export function addError(span: Span, message: string, stack?: string): void {
span.events.push({
name: "error",
@ -141,7 +232,14 @@ export function addError(span: Span, message: string, stack?: string): void {
// Trace helpers
// ---------------------------------------------------------------------------
/** Create a new trace with a root session span. */
/**
* Create a new trace with a root session span.
*
* Purpose: establish the top-level trace container and its root session span
* in one call so that headless.ts never creates a trace without a valid root.
*
* Consumer: headless.ts at the start of an auto-mode session.
*/
export function createTrace(
projectRoot: string,
sessionId?: string,
@ -164,13 +262,28 @@ export function createTrace(
};
}
/** Finalize a trace: set completedAt timestamp. */
/**
* Finalize a trace: set completedAt timestamp.
*
* Purpose: mark the trace as closed so that readers know the tree is
* complete and can safely compute session duration and aggregate costs.
*
* Consumer: headless.ts in the normal exit path and signal handlers.
*/
export function finalizeTrace(trace: Trace): Trace {
trace.completedAt = new Date().toISOString();
return trace;
}
/** Find a span in the tree by ID (linear walk). */
/**
* Find a span in the tree by ID (linear walk).
*
* Purpose: let collectors locate an existing span (e.g. to attach a child
* or end it) without maintaining a separate ID-to-span map.
*
* Consumer: trace-collector.ts when bridging async tool-call results back
* to their original span.
*/
export function findSpan(span: Span, id: string): Span | undefined {
if (span.id === id) return span;
for (const child of span.children) {
@ -180,12 +293,29 @@ export function findSpan(span: Span, id: string): Span | undefined {
return undefined;
}
/** Add a child span to a parent. */
/**
* Add a child span to a parent.
*
* Purpose: build the hierarchical tree (session unit tool) so that
* trace readers can collapse, expand, or aggregate by level.
*
* Consumer: trace-collector.ts when starting a unit or tool inside an
* already-running parent span.
*/
export function addChildSpan(parent: Span, child: Span): void {
parent.children.push(child);
}
/** Walk all spans in a trace (root first, depth-first). Yields each span. */
/**
* Walk all spans in a trace (root first, depth-first). Yields each span.
*
* Purpose: provide a simple, reusable traversal for aggregators, exporters,
* and debug printers that need to visit every span without writing recursive
* loops in every consumer.
*
* Consumer: trace analysis scripts, cost aggregators, and test assertions
* that verify span tree shape.
*/
export function* walkSpans(span: Span): Generator<Span, void, unknown> {
yield span;
for (const child of span.children) {
@ -200,6 +330,12 @@ export function* walkSpans(span: Span): Generator<Span, void, unknown> {
/**
* Serialize and write a trace to an arbitrary path.
* Creates parent directories as needed.
*
* Purpose: allow trace consumers (tests, CI scripts, manual debugging) to
* persist a trace anywhere on disk without hard-coding .sf/traces/ logic.
*
* Consumer: test suites that write traces to temp directories, and custom
* integrations that ship traces to external observability platforms.
*/
export function exportTrace(trace: Trace, path: string): void {
const dir = join(path, "..");
@ -212,6 +348,11 @@ export function exportTrace(trace: Trace, path: string): void {
/**
* Serialize and write a trace to .sf/traces/ in the project root.
* Filename: trace-<timestamp>.json
*
* Purpose: provide the standard, project-local trace sink so that every
* auto-mode session leaves a discoverable artifact in a known location.
*
* Consumer: headless.ts in the normal exit path and signal handlers.
*/
export function exportTraceToProject(
trace: Trace,
@ -229,6 +370,13 @@ export function exportTraceToProject(
/**
* Read a trace from disk.
*
* Purpose: round-trip a trace file back into the typed model so that
* analysis tools, test assertions, and replay utilities can work with
* structured data instead of raw JSON.
*
* Consumer: trace analysis scripts, test helpers, and any tool that reads
* .sf/traces/ for post-session inspection.
*/
export function readTrace(path: string): Trace {
return JSON.parse(readFileSync(path, "utf-8")) as Trace;