singularity-forge/docs/specs/agent-mode-system.md

# SF Agent Mode System

> **Status:** Draft specification. Promoted from `copilot-thoughts.md` research notes.
> **Scope:** TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target.
> **Decision authority:** Product + architecture review required before implementation.

---

## 1. Problem Statement

SF's old command surface treated mode switching as separate commands rather than persistent states. There was no visible indicator of the current mode, and the `/sf` prefix positioned SF as a plugin rather than the system itself.

Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.

**Goal:** Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.

---

## 2. Orthogonal State Axes

SF state is five independent axes, not one overloaded "mode."

```text
workMode:          chat | plan | build | review | repair | research
runControl:        manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode:         fast | smart | deep
surface:           tui | web | headless | rpc
```

### 2.1 Axis Definitions

| Axis | Question It Answers | Values |
|------|---------------------|--------|
| `workMode` | What kind of work is SF doing? | `chat`, `plan`, `build`, `review`, `repair`, `research` |
| `runControl` | Who advances the loop? | `manual` (user), `assisted` (one unit then pause), `autonomous` (continuous) |
| `permissionProfile` | What may proceed without approval? | `restricted`, `normal`, `trusted`, `unrestricted` |
| `modelMode` | Speed/cost/reasoning posture? | `fast` (cheap), `smart` (balanced), `deep` (reasoning) |
| `surface` | How is the user connected? | `tui`, `web`, `headless`, `rpc` |

### 2.2 Example Combinations

```text
plan     | manual     | normal     | deep     → user plans with reasoning model
build    | autonomous | trusted    | smart    → continuous implementation
repair   | assisted   | normal     | smart    → one repair unit at a time
research | autonomous | restricted | deep     → continuous research, read-only
review   | manual     | restricted | deep     → user reviews with reasoning model
```

### 2.3 Rules

- `permissionProfile` never implies `runControl`. Autonomous run with `restricted` permissions is valid.
- `runControl` never implies `permissionProfile`. Manual run with `unrestricted` permissions is valid.
- Denylists and safety gates override `permissionProfile` regardless of value.
- Every risk decision logs all five axis values.
- `sandboxProfile` may become a sixth axis later. It is separate from `permissionProfile`: sandboxing controls process/filesystem/network containment, while permission profile controls what SF may approve.

---

## 3. Work Modes

### 3.1 `chat`

Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.

### 3.2 `plan`

Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.

**Plan → Build handoff:**
```text
plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart
```

Surfaces:
- TUI: plan acceptance prompt includes "run autonomously" button
- Web: plan acceptance button includes "run autonomously"
- Headless: `--autonomous` chains into direct `/autonomous`
- RPC: machine event records transition explicitly

### 3.3 `build`

Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.

### 3.4 `review`

Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (`deep`).

### 3.5 `repair`

Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies, failed gates, generated/runtime drift, and broken state.

**Doctor is the diagnostic engine, not the mode.** `/doctor` inspects. `/repair` switches work mode.
`repair` is a `workMode`, not a separate subsystem.

Commands:
```text
/doctor          → inspect and report
/doctor fix      → deterministic auto-fix
/doctor heal     → LLM-assisted deep healing
/repair          → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit
```

**Auto-transition to repair:**
```text
build | autonomous | trusted | smart
→ repair | autonomous | normal | smart
```

Allowed when:
- pre-dispatch health gate fails
- installed runtime drift detected
- SF cannot dispatch safely
- repo workflow state corrupted

Policy: configurable per project. Options: `auto`, `ask`, `log-only`.

### 3.6 `research`

Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.

---

## 4. Run Control

| Value | Behavior |
|-------|----------|
| `manual` | User drives every step. Tool calls require approval. |
| `assisted` | SF executes one unit, then pauses for user review. |
| `autonomous` | SF continues until done, blocked, interrupted, budget-hit, or limit-hit. |

### 4.1 Commands

```text
/control manual
/control assisted
/control autonomous
/autonomous    → alias for /control autonomous
/next          → alias for /control assisted (one unit)
/pause         → pause autonomous, preserve state
/stop          → stop autonomous, clear state
```

### 4.2 Transition Scopes

| Scope | Behavior |
|-------|----------|
| `now` | Apply immediately if no tool active. Abort current tool if policy allows. |
| `after-current-tool` | Finish active tool, then switch. |
| `after-current-unit` | Finish current SF unit, then switch. |
| `next-milestone` | Switch after current milestone completes. |

Autonomous changes affect future decisions, never mutate active tool calls mid-execution.

### 4.3 Transition Logging

Every transition persists:

```json
{
  "timestamp": "2026-05-08T10:00:00Z",
  "from": {"workMode": "build", "runControl": "autonomous"},
  "to": {"workMode": "repair", "runControl": "autonomous"},
  "reason": "pre-dispatch health gate failed",
  "scope": "after-current-unit",
  "sessionId": "..."
}
```

---

## 5. Permission Profiles

| Profile | Description |
|---------|-------------|
| `restricted` | Read-only and explicitly allowlisted actions. |
| `normal` | Safe edits, non-destructive local commands. |
| `trusted` | Build/test/install/local commits and bounded repo automation. |
| `unrestricted` | High-risk orchestration only in intentionally trusted environments. |

### 5.1 Enforcement

Permission profile is enforced at three layers:

1. **Tool registry:** Each tool declares required profile. Tools below profile are hidden from model.
2. **Execution gate:** Each tool call checks profile at invocation. Violation = error.
3. **Safety harness:** Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.

### 5.2 Commands

```text
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
```

---

## 6. Model Modes

| Mode | Use Case | Routing Hint |
|------|----------|--------------|
| `fast` | Small bounded tasks | Cheapest available model |
| `smart` | Default balanced work | Default routing table |
| `deep` | Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) |

`modelMode` guides routing. It does not replace explicit `/model` selection.

---

## 7. Mode Switching UX

### 7.1 Direct Commands

```text
/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
/model-mode fast
/model-mode smart
/model-mode deep
```

### 7.2 Combined Forms

```text
/mode repair --autonomous --permission-profile normal
/mode build --autonomous --permission-profile trusted
/mode research --autonomous --permission-profile restricted --model-mode deep
```

### 7.3 Autonomous Steering

```text
/steer mode repair
/steer mode review after-current-unit
/steer permission-profile restricted now
/steer model-mode deep for-next-unit
```

### 7.4 Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+Shift+M` | Cycle workMode: chat → plan → build → review → repair → research → chat |
| `Ctrl+Shift+A` | Set runControl to autonomous |
| `Ctrl+Shift+S` | Set runControl to assisted (step) |
| `Ctrl+Shift+I` | Set runControl to manual (interactive) |
| `Ctrl+Shift+R` | Set workMode to repair |
| `Ctrl+Shift+P` | Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted |

---

## 8. Status and Mode Badge

### 8.1 Full Status Line

```text
SF  build | autonomous | trusted | smart
```

### 8.2 Compact Badge Form

For narrow terminals (< 80 cols):

```text
[B][A][T][S]
```

### 8.3 Critical State Labels

When workMode is `repair` or `review`, show full labels regardless of width:

```text
repair | autonomous | normal | smart
review | assisted | normal | deep
```

### 8.4 Badge Placement

| Surface | Placement |
|---------|-----------|
| TUI header | Left side, after "SF" logo |
| TUI status bar | Bottom line when header hidden |
| tmux/terminal title | `SF[build|A|trusted|smart] project-name` |
| Web | Top bar, color-coded chip |

### 8.5 Badge During Auto Mode

Current code hides header/footer during auto mode (`if (isAutoActive()) return []`). This must change:

- Show **minimal header** during auto: badge + project name only
- Or show badge in **dedicated status bar** separate from header/footer
- Badge color pulses slowly during autonomous execution (subtle animation)

### 8.6 Badge Colors

| Axis | Value | Color |
|------|-------|-------|
| workMode | `chat` | dim |
| workMode | `plan` | accent |
| workMode | `build` | success |
| workMode | `review` | warning |
| workMode | `repair` | error |
| workMode | `research` | info |
| runControl | `manual` | dim |
| runControl | `assisted` | warning |
| runControl | `autonomous` | success (pulsing) |
| permissionProfile | `restricted` | success |
| permissionProfile | `normal` | dim |
| permissionProfile | `trusted` | warning |
| permissionProfile | `unrestricted` | error |

---

## 9. Background Work Surface (`/tasks`)

Unified view of all background work. Replaces scattered `/status`, `/queue`, `/parallel status` for work inspection.

### 9.1 What `/tasks` Shows

- autonomous task lifecycle rows
- parallel workers
- scheduled autonomous dispatches and queued scheduler rows
- background shell sessions
- stuck or resumable sessions
- remote questions waiting for answers
- current cost/budget state
- last checkpoint and next action

### 9.2 Data Model

SQLite tables:

```sql
-- Durable task state
CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  work_mode TEXT NOT NULL,
  run_control TEXT NOT NULL,
  permission_profile TEXT NOT NULL,
  model_mode TEXT NOT NULL,
  status TEXT NOT NULL, -- todo | running | verifying | reviewing | done | blocked | paused | failed | cancelled | retrying
  dependency_blockers TEXT, -- JSON array of task IDs
  retry_count INTEGER DEFAULT 0,
  max_retries INTEGER DEFAULT 3,
  checkpoint_ref TEXT, -- git ref or patch file
  cost_budget REAL,
  cost_spent REAL DEFAULT 0,
  created_at TEXT, -- Temporal.Instant ISO
  started_at TEXT,
  completed_at TEXT,
  next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
  intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);

-- Scheduler state is separate from task lifecycle state
CREATE TABLE task_scheduler (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  status TEXT NOT NULL, -- queued | due | claimed | dispatched | consumed | expired
  due_at TEXT,
  claimed_by TEXT,
  dispatched_at TEXT,
  consumed_at TEXT,
  expires_at TEXT
);

-- Ephemeral running state
CREATE TABLE task_runtime (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  process_pid INTEGER,
  worktree_path TEXT,
  current_model TEXT,
  context_usage_percent REAL,
  last_heartbeat_at TEXT, -- Temporal.Instant
  FOREIGN KEY (task_id) REFERENCES tasks(id)
);

-- Transition log
CREATE TABLE task_transitions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  from_status TEXT NOT NULL,
  to_status TEXT NOT NULL,
  reason TEXT,
  scope TEXT,
  timestamp TEXT NOT NULL -- Temporal.Instant
);
```

Parallel workers must stay worktree-isolated and report heartbeat/status into
`.sf` state. Scheduler rows may use `queued`; task lifecycle rows use `todo`.

### 9.3 Complementary Commands

`/tasks` does not replace:
- `/status` → project health dashboard
- `/queue` → milestone/slice dispatch order
- `/parallel status` → parallel orchestrator detail
- `/session-report` → cost/token summary
- `/logs` → activity logs
- `/forensics` → execution forensics

---

## 10. Skills System

### 10.1 Directory Structure

```text
.agents/skills/<skill-name>/
  SKILL.md          -- skill definition with YAML frontmatter
  scripts/          -- supporting scripts
  schemas/          -- JSON schemas for inputs/outputs
  checklists/       -- verification checklists
  mcp.json          -- MCP server config if applicable
```

### 10.2 Skill Frontmatter

```yaml
---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---
```

Fields:
- `name`: unique identifier
- `description`: when to use this skill
- `user-invocable`: can user explicitly invoke?
- `model-invocable`: can model auto-invoke when relevant?
- `side-effects`: `none`, `code-edits`, `production-mutation`, etc.
- `permission-profile`: minimum profile required

### 10.3 Skill Categories

| Type | Example | `model-invocable` |
|------|---------|-------------------|
| Background knowledge | `forge-autonomous-runtime` | true |
| User tool | `production-deploy` | false |
| Shared capability | `forge-command-surface` | true |

Dangerous skills (`production-mutation`) are never model-invoked by default.

### 10.4 Auto-Creation Flow

1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
2. Propose skill in manual/restricted contexts
3. Generate/update automatically only when policy allows
4. Record repeated source evidence in `.sf` state
5. Keep narrow, linted, and evaled like code
6. Commit with repo when accepted

### 10.5 Skill Eval Cases

Every auto-created skill needs eval cases:

```text
.agents/skills/<skill-name>/evals/
  case-1/
    task.md          -- user-like prompt
    grader.js        -- deterministic checker
    hidden/          -- reference answers (not visible to agent)
    work/            -- agent workspace
```

Graders inspect: files, artifacts, `answer.json`, `trace.jsonl`, result state.
Failed trials preserve workspace for debugging.

---

## 11. Command Surfaces

### 11.1 Human Slash Commands

SF registers direct command roots only:

```text
/status
/autonomous
/doctor
/rate
/session-report
/parallel
/remote
/tasks
```

`/sf` is not a command root. TUI and browser command parity tests reject it so
compatibility shims do not grow back.

`/remote` is a full-session steering surface. Remote answers may change
`workMode`, `runControl`, `permissionProfile`, and `modelMode`; they are not
limited to question delivery.

### 11.2 Shell Surface

Machine surface remains prefixed:

```text
sf headless autonomous
sf headless --autonomous ...
```

The shell prefix is the executable name, not an interactive slash-command
namespace.

---

## 12. Runtime Target: Node 26

### 12.1 Policy

```text
current compatibility floor: Node 26.1+
internal target runtime:     Node 26.1
canonical baseline:          Node 26.1
Node 25:                     skip except quick probes
```

### 12.2 Why Node 26

- `Temporal` enabled by default
- V8 14.6 baseline
- Undici 8 HTTP/fetch baseline
- Removes legacy APIs, hardens against old assumptions

### 12.3 Temporal Adoption

Store semantic type, not just formatted string:

| Concept | Temporal Type | Use Case |
|---------|---------------|----------|
| Exact instant | `Temporal.Instant` | Journal events, checkpoints, lock leases |
| Local time | `Temporal.ZonedDateTime` | Reminders, schedules, audits |
| Calendar date | `Temporal.PlainDate` | Daily reports, milestone reviews |
| Wall-clock time | `Temporal.PlainTime` | Recurring policies |
| Time amount | `Temporal.Duration` | Budgets, leases, cooldowns, retry delays |

### 12.4 Adoption Priority

1. `sf schedule` — highest user-visible impact
2. Lock/lease — highest operational correctness
3. Journals/traces — highest debugging impact
4. Session reports — nice to have
5. Background tasks — future work

### 12.5 Gate

```text
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
```

---

## 13. Implementation Pull-Through

### 13.1 Already Directionally Right

- UOK lifecycle records carry `runControl`
- UOK lifecycle records carry `permissionProfile`
- Schedule command state uses `autonomous_dispatch`
- DB-backed state, recovery, verification, scheduling, captures, forensics
- Skills and project-specific skill paths exist
- Parallel orchestration and remote-question infrastructure

### 13.2 Still Needed

| Priority | Item | Effort |
|----------|------|--------|
| P2 | Decide whether `sandboxProfile` becomes a sixth persisted axis | Medium |
| P2 | Remove `/sf` from docs/web/tests (Phase 2 deprecation) | Small |

### 13.4 Recently Completed (This Session)

| Priority | Item | Status |
|----------|------|--------|
| P1 | Centralized metrics system (`metrics-central.js`) | ✓ |
| P1 | Cost command (`/cost`) with DB + ledger queries | ✓ |
| P1 | Explicit stage commands (`/research`, `/plan`, `/implement`) | ✓ |
| P2 | Reasoning assist foundation (`reasoning-assist.js`) | ✓ |
| P2 | Self-feedback → workMode auto-transition | ✓ |
| P2 | UOK events carry workMode + modelMode | ✓ |
| P2 | `/sf` prefix deprecation warning (Phase 1) | ✓ |

### 13.3 Completed

| Priority | Item | Status |
|----------|------|--------|
| P0 | Make mode state durable in SQLite | ✓ |
| P0 | Add direct `/mode`, `/control`, `/permission-profile`, `/model-mode` commands | ✓ |
| P0 | Add visible mode badge to TUI header/status bar | ✓ |
| P1 | Make `--autonomous` chain into direct `/autonomous` | ✓ |
| P1 | Expose autonomous continuation limits in settings and status | ✓ |
| P1 | Add `/tasks` backed by DB execution graph state | ✓ |
| P1 | Make `repair` first-class workflow over `doctor` | ✓ |
| P1 | Enhanced `/steer` with mode/permission-profile/model-mode transitions | ✓ |
| P1 | TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P) | ✓ |
| P1 | Minimal auto-mode header/footer (badge visible during autonomy) | ✓ |
| P1 | Remove `/sf` namespace registration and parity-test against fallback | ✓ |
| P1 | Parallel worker intent/claim registry backed by UOK SQLite coordination | ✓ |
| P1 | Skill eval harness foundation | ✓ |
| P1 | Terminal title mode indicator | ✓ |
| P2 | Policy-aware project skill suggestion/generation with DB cooldown | ✓ |
| P2 | Schema-backed task frontmatter (risk, mutation, verification, approval) | ✓ |
| P2 | Subagent provider/model/permission inheritance audit and guard | ✓ |
| P2 | Remote steering as full-session surface from remote answers | ✓ |

---

## 14. Open Questions

1. Should `plan` mode show badge `[P]` or `plan` text in full?
2. Should paused autonomous show previous badge dimmed, or `[P]` for paused?
3. Should mode be per-session or per-project? (Current: per-session)
4. Should badge appear in tmux/terminal window titles?
5. Should mode transitions have sound/notification?
6. Should `repair` auto-transition be `ask` by default for new projects?
7. Should skill eval cases run in CI or only on-demand?
8. Should `/tasks` be a TUI overlay or a separate scrollable panel?
9. Should reasoning assist call a fast model automatically, or only prepare prompts for now?

---

## 15. References

- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — <https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot>
- Factory Droid, "Autonomy Level" — <https://docs.factory.ai/cli/user-guides/auto-run>
- Amp manual — <https://ampcode.com/manual>
- Smelt (mode cycling) — <https://github.com/leonardcser/smelt>
- ORCH (task state machine) — <https://github.com/oxgeneral/ORCH>
- AgentPlane (schema-first tasks) — <https://github.com/basilisk-labs/agentplane>
- Relay (channels and tickets) — <https://github.com/jcast90/relay>
- Sage (runtime-neutral orchestration) — <https://github.com/youwangd/SageCLI>
- Wit (symbol-level locks) — <https://github.com/amaar-mc/wit>