singularity-forge/docs/specs/agent-mode-system.md

# SF Agent Mode System

> **Status:** Draft specification. Promoted from `copilot-thoughts.md` research notes.
> **Scope:** TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target.
> **Decision authority:** Product + architecture review required before implementation.

---

## 1. Problem Statement

SF's current command surface (`/sf autonomous`, `/sf next`, `/sf pause`, `/sf stop`) treats mode switching as separate commands rather than persistent states. There is no visible indicator of the current mode, and the `/sf` prefix positions SF as a plugin rather than the system itself.

Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.

**Goal:** Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.

---

## 2. Orthogonal State Axes

SF state is five independent axes, not one overloaded "mode."

```text
workMode:          chat | plan | build | review | repair | research
runControl:        manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode:         fast | smart | deep
surface:           tui | web | headless | rpc
```

### 2.1 Axis Definitions

| Axis | Question It Answers | Values |
|------|---------------------|--------|
| `workMode` | What kind of work is SF doing? | `chat`, `plan`, `build`, `review`, `repair`, `research` |
| `runControl` | Who advances the loop? | `manual` (user), `assisted` (one unit then pause), `autonomous` (continuous) |
| `permissionProfile` | What may proceed without approval? | `restricted`, `normal`, `trusted`, `unrestricted` |
| `modelMode` | Speed/cost/reasoning posture? | `fast` (cheap), `smart` (balanced), `deep` (reasoning) |
| `surface` | How is the user connected? | `tui`, `web`, `headless`, `rpc` |

### 2.2 Example Combinations

```text
plan     | manual     | normal     | deep     → user plans with reasoning model
build    | autonomous | trusted    | smart    → continuous implementation
repair   | assisted   | normal     | smart    → one repair unit at a time
research | autonomous | restricted | deep     → continuous research, read-only
review   | manual     | restricted | deep     → user reviews with reasoning model
```

### 2.3 Rules

- `permissionProfile` never implies `runControl`. Autonomous run with `restricted` permissions is valid.
- `runControl` never implies `permissionProfile`. Manual run with `unrestricted` permissions is valid.
- Denylists and safety gates override `permissionProfile` regardless of value.
- Every risk decision logs all five axis values.

---

## 3. Work Modes

### 3.1 `chat`

Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.

### 3.2 `plan`

Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.

**Plan → Build handoff:**
```text
plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart
```

Surfaces:
- TUI: plan acceptance prompt includes "run autonomously" button
- Web: plan acceptance button includes "run autonomously"
- Headless: `--autonomous` chains into direct `/autonomous`
- RPC: machine event records transition explicitly

### 3.3 `build`

Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.

### 3.4 `review`

Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (`deep`).

### 3.5 `repair`

Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies.

**Doctor is the diagnostic engine, not the mode.** `/doctor` inspects. `/repair` switches work mode.

Commands:
```text
/doctor          → inspect and report
/doctor fix      → deterministic auto-fix
/doctor heal     → LLM-assisted deep healing
/repair          → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit
```

**Auto-transition to repair:**
```text
build | autonomous | trusted | smart
→ repair | autonomous | normal | smart
```

Allowed when:
- pre-dispatch health gate fails
- installed runtime drift detected
- SF cannot dispatch safely
- repo workflow state corrupted

Policy: configurable per project. Options: `auto`, `ask`, `log-only`.

### 3.6 `research`

Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.

---

## 4. Run Control

| Value | Behavior |
|-------|----------|
| `manual` | User drives every step. Tool calls require approval. |
| `assisted` | SF executes one unit, then pauses for user review. |
| `autonomous` | SF continues until done, blocked, interrupted, budget-hit, or limit-hit. |

### 4.1 Commands

```text
/control manual
/control assisted
/control autonomous
/autonomous    → alias for /control autonomous
/next          → alias for /control assisted (one unit)
/pause         → pause autonomous, preserve state
/stop          → stop autonomous, clear state
```

### 4.2 Transition Scopes

| Scope | Behavior |
|-------|----------|
| `now` | Apply immediately if no tool active. Abort current tool if policy allows. |
| `after-current-tool` | Finish active tool, then switch. |
| `after-current-unit` | Finish current SF unit, then switch. |
| `next-milestone` | Switch after current milestone completes. |

Autonomous changes affect future decisions, never mutate active tool calls mid-execution.

### 4.3 Transition Logging

Every transition persists:

```json
{
  "timestamp": "2026-05-08T10:00:00Z",
  "from": {"workMode": "build", "runControl": "autonomous"},
  "to": {"workMode": "repair", "runControl": "autonomous"},
  "reason": "pre-dispatch health gate failed",
  "scope": "after-current-unit",
  "sessionId": "..."
}
```

---

## 5. Permission Profiles

| Profile | Description |
|---------|-------------|
| `restricted` | Read-only and explicitly allowlisted actions. |
| `normal` | Safe edits, non-destructive local commands. |
| `trusted` | Build/test/install/local commits and bounded repo automation. |
| `unrestricted` | High-risk orchestration only in intentionally trusted environments. |

### 5.1 Enforcement

Permission profile is enforced at three layers:

1. **Tool registry:** Each tool declares required profile. Tools below profile are hidden from model.
2. **Execution gate:** Each tool call checks profile at invocation. Violation = error.
3. **Safety harness:** Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.

### 5.2 Commands

```text
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
```

---

## 6. Model Modes

| Mode | Use Case | Routing Hint |
|------|----------|--------------|
| `fast` | Small bounded tasks | Cheapest available model |
| `smart` | Default balanced work | Default routing table |
| `deep` | Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) |

`modelMode` guides routing. It does not replace explicit `/model` selection.

---

## 7. Mode Switching UX

### 7.1 Direct Commands

```text
/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
/model-mode fast
/model-mode smart
/model-mode deep
```

### 7.2 Combined Forms

```text
/mode repair --autonomous --trust normal
/mode build --autonomous --trust trusted
/mode research --autonomous --trust restricted --model-mode deep
```

### 7.3 Autonomous Steering

```text
/steer mode repair
/steer mode review after-current-unit
/steer trust restricted now
/steer model-mode deep for-next-unit
```

### 7.4 Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+Shift+M` | Cycle workMode: chat → plan → build → review → repair → research → chat |
| `Ctrl+Shift+A` | Set runControl to autonomous |
| `Ctrl+Shift+S` | Set runControl to assisted (step) |
| `Ctrl+Shift+I` | Set runControl to manual (interactive) |
| `Ctrl+Shift+R` | Set workMode to repair |
| `Ctrl+Shift+P` | Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted |

---

## 8. Status and Mode Badge

### 8.1 Full Status Line

```text
SF  build | autonomous | trusted | smart
```

### 8.2 Compact Badge Form

For narrow terminals (< 80 cols):

```text
[B][A][T][S]
```

### 8.3 Critical State Labels

When workMode is `repair` or `review`, show full labels regardless of width:

```text
repair | autonomous | normal | smart
review | assisted | normal | deep
```

### 8.4 Badge Placement

| Surface | Placement |
|---------|-----------|
| TUI header | Left side, after "SF" logo |
| TUI status bar | Bottom line when header hidden |
| tmux/terminal title | `SF[build|A|trusted|smart] project-name` |
| Web | Top bar, color-coded chip |

### 8.5 Badge During Auto Mode

Current code hides header/footer during auto mode (`if (isAutoActive()) return []`). This must change:

- Show **minimal header** during auto: badge + project name only
- Or show badge in **dedicated status bar** separate from header/footer
- Badge color pulses slowly during autonomous execution (subtle animation)

### 8.6 Badge Colors

| Axis | Value | Color |
|------|-------|-------|
| workMode | `chat` | dim |
| workMode | `plan` | accent |
| workMode | `build` | success |
| workMode | `review` | warning |
| workMode | `repair` | error |
| workMode | `research` | info |
| runControl | `manual` | dim |
| runControl | `assisted` | warning |
| runControl | `autonomous` | success (pulsing) |
| permissionProfile | `restricted` | success |
| permissionProfile | `normal` | dim |
| permissionProfile | `trusted` | warning |
| permissionProfile | `unrestricted` | error |

---

## 9. Background Work Surface (`/tasks`)

Unified view of all background work. Replaces scattered `/status`, `/queue`, `/parallel status` for work inspection.

### 9.1 What `/tasks` Shows

- autonomous units (current + queued)
- parallel workers
- scheduled autonomous dispatches
- background shell sessions
- stuck or resumable sessions
- remote questions waiting for answers
- current cost/budget state
- last checkpoint and next action

### 9.2 Data Model

SQLite tables:

```sql
-- Durable task state
CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  work_mode TEXT NOT NULL,
  run_control TEXT NOT NULL,
  permission_profile TEXT NOT NULL,
  model_mode TEXT NOT NULL,
  status TEXT NOT NULL, -- pending | running | review | done | retrying | failed | cancelled
  dependency_blockers TEXT, -- JSON array of task IDs
  retry_count INTEGER DEFAULT 0,
  max_retries INTEGER DEFAULT 3,
  checkpoint_ref TEXT, -- git ref or patch file
  cost_budget REAL,
  cost_spent REAL DEFAULT 0,
  created_at TEXT, -- Temporal.Instant ISO
  started_at TEXT,
  completed_at TEXT,
  next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
  intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);

-- Ephemeral running state
CREATE TABLE task_runtime (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  process_pid INTEGER,
  worktree_path TEXT,
  current_model TEXT,
  context_usage_percent REAL,
  last_heartbeat_at TEXT, -- Temporal.Instant
  FOREIGN KEY (task_id) REFERENCES tasks(id)
);

-- Transition log
CREATE TABLE task_transitions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  from_status TEXT NOT NULL,
  to_status TEXT NOT NULL,
  reason TEXT,
  scope TEXT,
  timestamp TEXT NOT NULL -- Temporal.Instant
);
```

### 9.3 Complementary Commands

`/tasks` does not replace:
- `/status` → project health dashboard
- `/queue` → milestone/slice dispatch order
- `/parallel status` → parallel orchestrator detail
- `/session-report` → cost/token summary
- `/logs` → activity logs
- `/forensics` → execution forensics

---

## 10. Skills System

### 10.1 Directory Structure

```text
.agents/skills/<skill-name>/
  SKILL.md          -- skill definition with YAML frontmatter
  scripts/          -- supporting scripts
  schemas/          -- JSON schemas for inputs/outputs
  checklists/       -- verification checklists
  mcp.json          -- MCP server config if applicable
```

### 10.2 Skill Frontmatter

```yaml
---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---
```

Fields:
- `name`: unique identifier
- `description`: when to use this skill
- `user-invocable`: can user explicitly invoke?
- `model-invocable`: can model auto-invoke when relevant?
- `side-effects`: `none`, `code-edits`, `production-mutation`, etc.
- `permission-profile`: minimum profile required

### 10.3 Skill Categories

| Type | Example | `model-invocable` |
|------|---------|-------------------|
| Background knowledge | `forge-autonomous-runtime` | true |
| User tool | `production-deploy` | false |
| Shared capability | `forge-command-surface` | true |

Dangerous skills (`production-mutation`) are never model-invoked by default.

### 10.4 Auto-Creation Flow

1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
2. Propose skill in manual/restricted contexts
3. Generate/update automatically only when policy allows
4. Record source evidence in `.sf` state
5. Keep narrow and testable
6. Commit with repo when accepted

### 10.5 Skill Eval Cases

Every auto-created skill needs eval cases:

```text
.agents/skills/<skill-name>/evals/
  case-1/
    task.md          -- user-like prompt
    grader.js        -- deterministic checker
    hidden/          -- reference answers (not visible to agent)
    work/            -- agent workspace
```

Graders inspect: files, artifacts, `answer.json`, `trace.jsonl`, result state.
Failed trials preserve workspace for debugging.

---

## 11. Migration from `/sf` Commands

### 11.1 Command Mapping

| Old | New | Status |
|-----|-----|--------|
| `/sf` | `/next` | migrate |
| `/sf autonomous` | `/autonomous` | migrate |
| `/sf next` | `/next` | migrate |
| `/sf stop` | `/stop` | migrate |
| `/sf pause` | `/pause` | migrate |
| `/sf status` | `/status` | migrate |
| `/sf doctor` | `/doctor` | migrate |
| `/sf rate` | `/rate` | migrate |
| `/sf session-report` | `/session-report` | migrate |
| `/sf parallel` | `/parallel` | migrate |
| `/sf remote` | `/remote` | migrate |
| `/sf tasks` | `/tasks` | new |

### 11.2 Migration Timeline

| Phase | Action |
|-------|--------|
| Phase 1 (now) | Accept both `/sf X` and `/X`. Log deprecation warning for `/sf`. |
| Phase 2 (2 releases) | `/sf X` shows warning: "Use /X instead. /sf will be removed." |
| Phase 3 (4 releases) | `/sf X` errors: "Unknown command. Did you mean /X?" |
| Phase 4 (6 releases) | Remove `/sf` handler entirely. |

### 11.3 Shell Surface

Machine surface remains prefixed:

```text
sf headless autonomous
sf headless --autonomous ...
```

---

## 12. Runtime Target: Node 26

### 12.1 Policy

```text
current compatibility floor: Node 26.1+
internal target runtime:     Node 26.1
canonical baseline:          Node 26.1
Node 25:                     skip except quick probes
```

### 12.2 Why Node 26

- `Temporal` enabled by default
- V8 14.6 baseline
- Undici 8 HTTP/fetch baseline
- Removes legacy APIs, hardens against old assumptions

### 12.3 Temporal Adoption

Store semantic type, not just formatted string:

| Concept | Temporal Type | Use Case |
|---------|---------------|----------|
| Exact instant | `Temporal.Instant` | Journal events, checkpoints, lock leases |
| Local time | `Temporal.ZonedDateTime` | Reminders, schedules, audits |
| Calendar date | `Temporal.PlainDate` | Daily reports, milestone reviews |
| Wall-clock time | `Temporal.PlainTime` | Recurring policies |
| Time amount | `Temporal.Duration` | Budgets, leases, cooldowns, retry delays |

### 12.4 Adoption Priority

1. `sf schedule` — highest user-visible impact
2. Lock/lease — highest operational correctness
3. Journals/traces — highest debugging impact
4. Session reports — nice to have
5. Background tasks — future work

### 12.5 Gate

```text
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
```

---

## 13. Implementation Pull-Through

### 13.1 Already Directionally Right

- UOK lifecycle records carry `runControl`
- UOK lifecycle records carry `permissionProfile`
- Schedule command state uses `autonomous_dispatch`
- DB-backed state, recovery, verification, scheduling, captures, forensics
- Skills and project-specific skill paths exist
- Parallel orchestration and remote-question infrastructure

### 13.2 Still Needed

| Priority | Item | Effort |
|----------|------|--------|
| P0 | Remove `/sf` internal dispatch, docs, tests, help text | Medium |
| P0 | Make `workMode` durable state (SQLite + `.sf/`) | Medium |
| P0 | Add direct `/mode`, `/control`, `/trust`, `/model-mode` commands | Medium |
| P0 | Add visible mode badge to TUI header/status bar | Small |
| P1 | Make `--autonomous` chain into direct `/autonomous` | Small |
| P1 | Expose autonomous continuation limits in settings and status | Small |
| P1 | Add `/tasks` with durable + ephemeral state | Large |
| P1 | Make `repair` first-class workflow over `doctor` | Medium |
| P2 | Policy-aware project skill suggestion/generation | Large |
| P2 | Skill eval cases for generated skills | Large |
| P2 | Schema-backed task frontmatter (risk, mutation, verification) | Medium |
| P2 | Intent/claim records for parallel workers | Medium |
| P2 | Audit subagent provider/model/permission inheritance | Medium |
| P2 | Audit remote steering as full-session surface | Medium |

---

## 14. Open Questions

1. Should `plan` mode show badge `[P]` or `plan` text in full?
2. Should paused autonomous show previous badge dimmed, or `[P]` for paused?
3. Should mode be per-session or per-project? (Current: per-session)
4. Should badge appear in tmux/terminal window titles?
5. Should mode transitions have sound/notification?
6. Should `repair` auto-transition be `ask` by default for new projects?
7. Should skill eval cases run in CI or only on-demand?
8. Should `/tasks` be a TUI overlay or a separate scrollable panel?

---

## 15. References

- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — <https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot>
- Factory Droid, "Autonomy Level" — <https://docs.factory.ai/cli/user-guides/auto-run>
- Amp manual — <https://ampcode.com/manual>
- Smelt (mode cycling) — <https://github.com/leonardcser/smelt>
- ORCH (task state machine) — <https://github.com/oxgeneral/ORCH>
- AgentPlane (schema-first tasks) — <https://github.com/basilisk-labs/agentplane>
- Relay (channels and tickets) — <https://github.com/jcast90/relay>
- Sage (runtime-neutral orchestration) — <https://github.com/youwangd/SageCLI>
- Wit (symbol-level locks) — <https://github.com/amaar-mc/wit>
feat(sf): align node sqlite uok runtime 2026-05-08 02:52:05 +02:00			`# SF Agent Mode System`

			> Status: Draft specification. Promoted from `copilot-thoughts.md` research notes.
			`> Scope: TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target.`
			`> Decision authority: Product + architecture review required before implementation.`

			`---`

			`## 1. Problem Statement`

			SF's current command surface (`/sf autonomous`, `/sf next`, `/sf pause`, `/sf stop`) treats mode switching as separate commands rather than persistent states. There is no visible indicator of the current mode, and the `/sf` prefix positions SF as a plugin rather than the system itself.

			`Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.`

			`Goal: Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.`

			`---`

			`## 2. Orthogonal State Axes`

			`SF state is five independent axes, not one overloaded "mode."`

			```text
			`workMode: chat \| plan \| build \| review \| repair \| research`
			`runControl: manual \| assisted \| autonomous`
			`permissionProfile: restricted \| normal \| trusted \| unrestricted`
			`modelMode: fast \| smart \| deep`
			`surface: tui \| web \| headless \| rpc`
			```

			`### 2.1 Axis Definitions`

			`\| Axis \| Question It Answers \| Values \|`
			`\|------\|---------------------\|--------\|`
			\| `workMode` \| What kind of work is SF doing? \| `chat`, `plan`, `build`, `review`, `repair`, `research` \|
			\| `runControl` \| Who advances the loop? \| `manual` (user), `assisted` (one unit then pause), `autonomous` (continuous) \|
			\| `permissionProfile` \| What may proceed without approval? \| `restricted`, `normal`, `trusted`, `unrestricted` \|
			\| `modelMode` \| Speed/cost/reasoning posture? \| `fast` (cheap), `smart` (balanced), `deep` (reasoning) \|
			\| `surface` \| How is the user connected? \| `tui`, `web`, `headless`, `rpc` \|

			`### 2.2 Example Combinations`

			```text
			`plan \| manual \| normal \| deep → user plans with reasoning model`
			`build \| autonomous \| trusted \| smart → continuous implementation`
			`repair \| assisted \| normal \| smart → one repair unit at a time`
			`research \| autonomous \| restricted \| deep → continuous research, read-only`
			`review \| manual \| restricted \| deep → user reviews with reasoning model`
			```

			`### 2.3 Rules`

			- `permissionProfile` never implies `runControl`. Autonomous run with `restricted` permissions is valid.
			- `runControl` never implies `permissionProfile`. Manual run with `unrestricted` permissions is valid.
			- Denylists and safety gates override `permissionProfile` regardless of value.
			`- Every risk decision logs all five axis values.`

			`---`

			`## 3. Work Modes`

			### 3.1 `chat`

			`Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.`

			### 3.2 `plan`

			`Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.`

			`Plan → Build handoff:`
			```text
			`plan \| manual \| normal \| deep`
			`accept plan`
			`build \| autonomous \| selected-permission-profile \| smart`
			```

			`Surfaces:`
			`- TUI: plan acceptance prompt includes "run autonomously" button`
			`- Web: plan acceptance button includes "run autonomously"`
			- Headless: `--autonomous` chains into direct `/autonomous`
			`- RPC: machine event records transition explicitly`

			### 3.3 `build`

			`Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.`

			### 3.4 `review`

			Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (`deep`).

			### 3.5 `repair`

			`Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies.`

			Doctor is the diagnostic engine, not the mode. `/doctor` inspects. `/repair` switches work mode.

			`Commands:`
			```text
			`/doctor → inspect and report`
			`/doctor fix → deterministic auto-fix`
			`/doctor heal → LLM-assisted deep healing`
			`/repair → switch workMode to repair`
			`/repair --autonomous → repair until clean, blocked, or limit-hit`
			```

			`Auto-transition to repair:`
			```text
			`build \| autonomous \| trusted \| smart`
			`→ repair \| autonomous \| normal \| smart`
			```

			`Allowed when:`
			`- pre-dispatch health gate fails`
			`- installed runtime drift detected`
			`- SF cannot dispatch safely`
			`- repo workflow state corrupted`

			Policy: configurable per project. Options: `auto`, `ask`, `log-only`.

			### 3.6 `research`

			`Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.`

			`---`

			`## 4. Run Control`

			`\| Value \| Behavior \|`
			`\|-------\|----------\|`
			\| `manual` \| User drives every step. Tool calls require approval. \|
			\| `assisted` \| SF executes one unit, then pauses for user review. \|
			\| `autonomous` \| SF continues until done, blocked, interrupted, budget-hit, or limit-hit. \|

			`### 4.1 Commands`

			```text
			`/control manual`
			`/control assisted`
			`/control autonomous`
			`/autonomous → alias for /control autonomous`
			`/next → alias for /control assisted (one unit)`
			`/pause → pause autonomous, preserve state`
			`/stop → stop autonomous, clear state`
			```

			`### 4.2 Transition Scopes`

			`\| Scope \| Behavior \|`
			`\|-------\|----------\|`
			\| `now` \| Apply immediately if no tool active. Abort current tool if policy allows. \|
			\| `after-current-tool` \| Finish active tool, then switch. \|
			\| `after-current-unit` \| Finish current SF unit, then switch. \|
			\| `next-milestone` \| Switch after current milestone completes. \|

			`Autonomous changes affect future decisions, never mutate active tool calls mid-execution.`

			`### 4.3 Transition Logging`

			`Every transition persists:`

			```json
			`{`
			`"timestamp": "2026-05-08T10:00:00Z",`
			`"from": {"workMode": "build", "runControl": "autonomous"},`
			`"to": {"workMode": "repair", "runControl": "autonomous"},`
			`"reason": "pre-dispatch health gate failed",`
			`"scope": "after-current-unit",`
			`"sessionId": "..."`
			`}`
			```

			`---`

			`## 5. Permission Profiles`

			`\| Profile \| Description \|`
			`\|---------\|-------------\|`
			\| `restricted` \| Read-only and explicitly allowlisted actions. \|
			\| `normal` \| Safe edits, non-destructive local commands. \|
			\| `trusted` \| Build/test/install/local commits and bounded repo automation. \|
			\| `unrestricted` \| High-risk orchestration only in intentionally trusted environments. \|

			`### 5.1 Enforcement`

			`Permission profile is enforced at three layers:`

			`1. Tool registry: Each tool declares required profile. Tools below profile are hidden from model.`
			`2. Execution gate: Each tool call checks profile at invocation. Violation = error.`
			`3. Safety harness: Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.`

			`### 5.2 Commands`

			```text
			`/trust restricted`
			`/trust normal`
			`/trust trusted`
			`/trust unrestricted`
			```

			`---`

			`## 6. Model Modes`

			`\| Mode \| Use Case \| Routing Hint \|`
			`\|------\|----------\|--------------\|`
			\| `fast` \| Small bounded tasks \| Cheapest available model \|
			\| `smart` \| Default balanced work \| Default routing table \|
			\| `deep` \| Planning, debugging, research, review \| Reasoning model (o1, Claude Opus, etc.) \|

			`modelMode` guides routing. It does not replace explicit `/model` selection.

			`---`

			`## 7. Mode Switching UX`

			`### 7.1 Direct Commands`

			```text
			`/mode chat`
			`/mode plan`
			`/mode build`
			`/mode review`
			`/mode repair`
			`/mode research`
			`/control manual`
			`/control assisted`
			`/control autonomous`
			`/trust restricted`
			`/trust normal`
			`/trust trusted`
			`/trust unrestricted`
			`/model-mode fast`
			`/model-mode smart`
			`/model-mode deep`
			```

			`### 7.2 Combined Forms`

			```text
			`/mode repair --autonomous --trust normal`
			`/mode build --autonomous --trust trusted`
			`/mode research --autonomous --trust restricted --model-mode deep`
			```

			`### 7.3 Autonomous Steering`

			```text
			`/steer mode repair`
			`/steer mode review after-current-unit`
			`/steer trust restricted now`
			`/steer model-mode deep for-next-unit`
			```

			`### 7.4 Keyboard Shortcuts`

			`\| Shortcut \| Action \|`
			`\|----------\|--------\|`
			\| `Ctrl+Shift+M` \| Cycle workMode: chat → plan → build → review → repair → research → chat \|
			\| `Ctrl+Shift+A` \| Set runControl to autonomous \|
			\| `Ctrl+Shift+S` \| Set runControl to assisted (step) \|
			\| `Ctrl+Shift+I` \| Set runControl to manual (interactive) \|
			\| `Ctrl+Shift+R` \| Set workMode to repair \|
			\| `Ctrl+Shift+P` \| Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted \|

			`---`

			`## 8. Status and Mode Badge`

			`### 8.1 Full Status Line`

			```text
			`SF build \| autonomous \| trusted \| smart`
			```

			`### 8.2 Compact Badge Form`

			`For narrow terminals (< 80 cols):`

			```text
			`[B][A][T][S]`
			```

			`### 8.3 Critical State Labels`

			When workMode is `repair` or `review`, show full labels regardless of width:

			```text
			`repair \| autonomous \| normal \| smart`
			`review \| assisted \| normal \| deep`
			```

			`### 8.4 Badge Placement`

			`\| Surface \| Placement \|`
			`\|---------\|-----------\|`
			`\| TUI header \| Left side, after "SF" logo \|`
			`\| TUI status bar \| Bottom line when header hidden \|`
			\| tmux/terminal title \| `SF[build\|A\|trusted\|smart] project-name` \|
			`\| Web \| Top bar, color-coded chip \|`

			`### 8.5 Badge During Auto Mode`

			Current code hides header/footer during auto mode (`if (isAutoActive()) return []`). This must change:

			`- Show minimal header during auto: badge + project name only`
			`- Or show badge in dedicated status bar separate from header/footer`
			`- Badge color pulses slowly during autonomous execution (subtle animation)`

			`### 8.6 Badge Colors`

			`\| Axis \| Value \| Color \|`
			`\|------\|-------\|-------\|`
			\| workMode \| `chat` \| dim \|
			\| workMode \| `plan` \| accent \|
			\| workMode \| `build` \| success \|
			\| workMode \| `review` \| warning \|
			\| workMode \| `repair` \| error \|
			\| workMode \| `research` \| info \|
			\| runControl \| `manual` \| dim \|
			\| runControl \| `assisted` \| warning \|
			\| runControl \| `autonomous` \| success (pulsing) \|
			\| permissionProfile \| `restricted` \| success \|
			\| permissionProfile \| `normal` \| dim \|
			\| permissionProfile \| `trusted` \| warning \|
			\| permissionProfile \| `unrestricted` \| error \|

			`---`

			## 9. Background Work Surface (`/tasks`)

			Unified view of all background work. Replaces scattered `/status`, `/queue`, `/parallel status` for work inspection.

			### 9.1 What `/tasks` Shows

			`- autonomous units (current + queued)`
			`- parallel workers`
			`- scheduled autonomous dispatches`
			`- background shell sessions`
			`- stuck or resumable sessions`
			`- remote questions waiting for answers`
			`- current cost/budget state`
			`- last checkpoint and next action`

			`### 9.2 Data Model`

			`SQLite tables:`

			```sql
			`-- Durable task state`
			`CREATE TABLE tasks (`
			`id TEXT PRIMARY KEY,`
			`work_mode TEXT NOT NULL,`
			`run_control TEXT NOT NULL,`
			`permission_profile TEXT NOT NULL,`
			`model_mode TEXT NOT NULL,`
			`status TEXT NOT NULL, -- pending \| running \| review \| done \| retrying \| failed \| cancelled`
			`dependency_blockers TEXT, -- JSON array of task IDs`
			`retry_count INTEGER DEFAULT 0,`
			`max_retries INTEGER DEFAULT 3,`
			`checkpoint_ref TEXT, -- git ref or patch file`
			`cost_budget REAL,`
			`cost_spent REAL DEFAULT 0,`
			`created_at TEXT, -- Temporal.Instant ISO`
			`started_at TEXT,`
			`completed_at TEXT,`
			`next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled`
			`intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"`
			`);`

			`-- Ephemeral running state`
			`CREATE TABLE task_runtime (`
			`task_id TEXT PRIMARY KEY REFERENCES tasks(id),`
			`process_pid INTEGER,`
			`worktree_path TEXT,`
			`current_model TEXT,`
			`context_usage_percent REAL,`
			`last_heartbeat_at TEXT, -- Temporal.Instant`
			`FOREIGN KEY (task_id) REFERENCES tasks(id)`
			`);`

			`-- Transition log`
			`CREATE TABLE task_transitions (`
			`id INTEGER PRIMARY KEY AUTOINCREMENT,`
			`task_id TEXT NOT NULL,`
			`from_status TEXT NOT NULL,`
			`to_status TEXT NOT NULL,`
			`reason TEXT,`
			`scope TEXT,`
			`timestamp TEXT NOT NULL -- Temporal.Instant`
			`);`
			```

			`### 9.3 Complementary Commands`

			`/tasks` does not replace:
			- `/status` → project health dashboard
			- `/queue` → milestone/slice dispatch order
			- `/parallel status` → parallel orchestrator detail
			- `/session-report` → cost/token summary
			- `/logs` → activity logs
			- `/forensics` → execution forensics

			`---`

			`## 10. Skills System`

			`### 10.1 Directory Structure`

			```text
			`.agents/skills/<skill-name>/`
			`SKILL.md -- skill definition with YAML frontmatter`
			`scripts/ -- supporting scripts`
			`schemas/ -- JSON schemas for inputs/outputs`
			`checklists/ -- verification checklists`
			`mcp.json -- MCP server config if applicable`
			```

			`### 10.2 Skill Frontmatter`

			```yaml
			`---`
			`name: forge-command-surface`
			`description: Use when changing SF slash commands, browser command parity, or headless command dispatch.`
			`user-invocable: true`
			`model-invocable: true`
			`side-effects: code-edits`
			`permission-profile: normal`
			`---`
			```

			`Fields:`
			- `name`: unique identifier
			- `description`: when to use this skill
			- `user-invocable`: can user explicitly invoke?
			- `model-invocable`: can model auto-invoke when relevant?
			- `side-effects`: `none`, `code-edits`, `production-mutation`, etc.
			- `permission-profile`: minimum profile required

			`### 10.3 Skill Categories`

			\| Type \| Example \| `model-invocable` \|
			`\|------\|---------\|-------------------\|`
			\| Background knowledge \| `forge-autonomous-runtime` \| true \|
			\| User tool \| `production-deploy` \| false \|
			\| Shared capability \| `forge-command-surface` \| true \|

			Dangerous skills (`production-mutation`) are never model-invoked by default.

			`### 10.4 Auto-Creation Flow`

			`1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules)`
			`2. Propose skill in manual/restricted contexts`
			`3. Generate/update automatically only when policy allows`
			4. Record source evidence in `.sf` state
			`5. Keep narrow and testable`
			`6. Commit with repo when accepted`

			`### 10.5 Skill Eval Cases`

			`Every auto-created skill needs eval cases:`

			```text
			`.agents/skills/<skill-name>/evals/`
			`case-1/`
			`task.md -- user-like prompt`
			`grader.js -- deterministic checker`
			`hidden/ -- reference answers (not visible to agent)`
			`work/ -- agent workspace`
			```

			Graders inspect: files, artifacts, `answer.json`, `trace.jsonl`, result state.
			`Failed trials preserve workspace for debugging.`

			`---`

			## 11. Migration from `/sf` Commands

			`### 11.1 Command Mapping`

			`\| Old \| New \| Status \|`
			`\|-----\|-----\|--------\|`
			\| `/sf` \| `/next` \| migrate \|
			\| `/sf autonomous` \| `/autonomous` \| migrate \|
			\| `/sf next` \| `/next` \| migrate \|
			\| `/sf stop` \| `/stop` \| migrate \|
			\| `/sf pause` \| `/pause` \| migrate \|
			\| `/sf status` \| `/status` \| migrate \|
			\| `/sf doctor` \| `/doctor` \| migrate \|
			\| `/sf rate` \| `/rate` \| migrate \|
			\| `/sf session-report` \| `/session-report` \| migrate \|
			\| `/sf parallel` \| `/parallel` \| migrate \|
			\| `/sf remote` \| `/remote` \| migrate \|
			\| `/sf tasks` \| `/tasks` \| new \|

			`### 11.2 Migration Timeline`

			`\| Phase \| Action \|`
			`\|-------\|--------\|`
			\| Phase 1 (now) \| Accept both `/sf X` and `/X`. Log deprecation warning for `/sf`. \|
			\| Phase 2 (2 releases) \| `/sf X` shows warning: "Use /X instead. /sf will be removed." \|
			\| Phase 3 (4 releases) \| `/sf X` errors: "Unknown command. Did you mean /X?" \|
			\| Phase 4 (6 releases) \| Remove `/sf` handler entirely. \|

			`### 11.3 Shell Surface`

			`Machine surface remains prefixed:`

			```text
			`sf headless autonomous`
			`sf headless --autonomous ...`
			```

			`---`

			`## 12. Runtime Target: Node 26`

			`### 12.1 Policy`

			```text
			`current compatibility floor: Node 26.1+`
			`internal target runtime: Node 26.1`
			`canonical baseline: Node 26.1`
			`Node 25: skip except quick probes`
			```

			`### 12.2 Why Node 26`

			- `Temporal` enabled by default
			`- V8 14.6 baseline`
			`- Undici 8 HTTP/fetch baseline`
			`- Removes legacy APIs, hardens against old assumptions`

			`### 12.3 Temporal Adoption`

			`Store semantic type, not just formatted string:`

			`\| Concept \| Temporal Type \| Use Case \|`
			`\|---------\|---------------\|----------\|`
			\| Exact instant \| `Temporal.Instant` \| Journal events, checkpoints, lock leases \|
			\| Local time \| `Temporal.ZonedDateTime` \| Reminders, schedules, audits \|
			\| Calendar date \| `Temporal.PlainDate` \| Daily reports, milestone reviews \|
			\| Wall-clock time \| `Temporal.PlainTime` \| Recurring policies \|
			\| Time amount \| `Temporal.Duration` \| Budgets, leases, cooldowns, retry delays \|

			`### 12.4 Adoption Priority`

			1. `sf schedule` — highest user-visible impact
			`2. Lock/lease — highest operational correctness`
			`3. Journals/traces — highest debugging impact`
			`4. Session reports — nice to have`
			`5. Background tasks — future work`

			`### 12.5 Gate`

			```text
			`node@26 --version`
			`npm run lint`
			`npm run typecheck:extensions`
			`npm run build`
			`npm test`
			`sf --version`
			`sf --help`
			`sf --print "ping"`
			```

			`---`

			`## 13. Implementation Pull-Through`

			`### 13.1 Already Directionally Right`

			- UOK lifecycle records carry `runControl`
			- UOK lifecycle records carry `permissionProfile`
			- Schedule command state uses `autonomous_dispatch`
			`- DB-backed state, recovery, verification, scheduling, captures, forensics`
			`- Skills and project-specific skill paths exist`
			`- Parallel orchestration and remote-question infrastructure`

			`### 13.2 Still Needed`

			`\| Priority \| Item \| Effort \|`
			`\|----------\|------\|--------\|`
			\| P0 \| Remove `/sf` internal dispatch, docs, tests, help text \| Medium \|
			\| P0 \| Make `workMode` durable state (SQLite + `.sf/`) \| Medium \|
			\| P0 \| Add direct `/mode`, `/control`, `/trust`, `/model-mode` commands \| Medium \|
			`\| P0 \| Add visible mode badge to TUI header/status bar \| Small \|`
			\| P1 \| Make `--autonomous` chain into direct `/autonomous` \| Small \|
			`\| P1 \| Expose autonomous continuation limits in settings and status \| Small \|`
			\| P1 \| Add `/tasks` with durable + ephemeral state \| Large \|
			\| P1 \| Make `repair` first-class workflow over `doctor` \| Medium \|
			`\| P2 \| Policy-aware project skill suggestion/generation \| Large \|`
			`\| P2 \| Skill eval cases for generated skills \| Large \|`
			`\| P2 \| Schema-backed task frontmatter (risk, mutation, verification) \| Medium \|`
			`\| P2 \| Intent/claim records for parallel workers \| Medium \|`
			`\| P2 \| Audit subagent provider/model/permission inheritance \| Medium \|`
			`\| P2 \| Audit remote steering as full-session surface \| Medium \|`

			`---`

			`## 14. Open Questions`

			1. Should `plan` mode show badge `[P]` or `plan` text in full?
			2. Should paused autonomous show previous badge dimmed, or `[P]` for paused?
			`3. Should mode be per-session or per-project? (Current: per-session)`
			`4. Should badge appear in tmux/terminal window titles?`
			`5. Should mode transitions have sound/notification?`
			6. Should `repair` auto-transition be `ask` by default for new projects?
			`7. Should skill eval cases run in CI or only on-demand?`
			8. Should `/tasks` be a TUI overlay or a separate scrollable panel?

			`---`

			`## 15. References`

			`- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — <https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot>`
			`- Factory Droid, "Autonomy Level" — <https://docs.factory.ai/cli/user-guides/auto-run>`
			`- Amp manual — <https://ampcode.com/manual>`
			`- Smelt (mode cycling) — <https://github.com/leonardcser/smelt>`
			`- ORCH (task state machine) — <https://github.com/oxgeneral/ORCH>`
			`- AgentPlane (schema-first tasks) — <https://github.com/basilisk-labs/agentplane>`
			`- Relay (channels and tickets) — <https://github.com/jcast90/relay>`
			`- Sage (runtime-neutral orchestration) — <https://github.com/youwangd/SageCLI>`
			`- Wit (symbol-level locks) — <https://github.com/amaar-mc/wit>`