# SF Agent Mode System > **Status:** Draft specification. Promoted from `copilot-thoughts.md` research notes. > **Scope:** TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target. > **Decision authority:** Product + architecture review required before implementation. --- ## 1. Problem Statement SF's old command surface treated mode switching as separate commands rather than persistent states. There was no visible indicator of the current mode, and the `/sf` prefix positioned SF as a plugin rather than the system itself. Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation. **Goal:** Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp. --- ## 2. Orthogonal State Axes SF state is five independent axes, not one overloaded "mode." ```text workMode: chat | plan | build | review | repair | research runControl: manual | assisted | autonomous permissionProfile: restricted | normal | trusted | unrestricted modelMode: fast | smart | deep surface: tui | web | headless | rpc ``` ### 2.1 Axis Definitions | Axis | Question It Answers | Values | |------|---------------------|--------| | `workMode` | What kind of work is SF doing? | `chat`, `plan`, `build`, `review`, `repair`, `research` | | `runControl` | Who advances the loop? | `manual` (user), `assisted` (one unit then pause), `autonomous` (continuous) | | `permissionProfile` | What may proceed without approval? | `restricted`, `normal`, `trusted`, `unrestricted` | | `modelMode` | Speed/cost/reasoning posture? | `fast` (cheap), `smart` (balanced), `deep` (reasoning) | | `surface` | How is the user connected? | `tui`, `web`, `headless`, `rpc` | ### 2.2 Example Combinations ```text plan | manual | normal | deep → user plans with reasoning model build | autonomous | trusted | smart → continuous implementation repair | assisted | normal | smart → one repair unit at a time research | autonomous | restricted | deep → continuous research, read-only review | manual | restricted | deep → user reviews with reasoning model ``` ### 2.3 Rules - `permissionProfile` never implies `runControl`. Autonomous run with `restricted` permissions is valid. - `runControl` never implies `permissionProfile`. Manual run with `unrestricted` permissions is valid. - Denylists and safety gates override `permissionProfile` regardless of value. - Every risk decision logs all five axis values. - `sandboxProfile` may become a sixth axis later. It is separate from `permissionProfile`: sandboxing controls process/filesystem/network containment, while permission profile controls what SF may approve. --- ## 3. Work Modes ### 3.1 `chat` Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request. ### 3.2 `plan` Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here. **Plan → Build handoff:** ```text plan | manual | normal | deep accept plan build | autonomous | selected-permission-profile | smart ``` Surfaces: - TUI: plan acceptance prompt includes "run autonomously" button - Web: plan acceptance button includes "run autonomously" - Headless: `--autonomous` chains into direct `/autonomous` - RPC: machine event records transition explicitly ### 3.3 `build` Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default. ### 3.4 `review` Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (`deep`). ### 3.5 `repair` Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies, failed gates, generated/runtime drift, and broken state. **Doctor is the diagnostic engine, not the mode.** `/doctor` inspects. `/repair` switches work mode. `repair` is a `workMode`, not a separate subsystem. Commands: ```text /doctor → inspect and report /doctor fix → deterministic auto-fix /doctor heal → LLM-assisted deep healing /repair → switch workMode to repair /repair --autonomous → repair until clean, blocked, or limit-hit ``` **Auto-transition to repair:** ```text build | autonomous | trusted | smart → repair | autonomous | normal | smart ``` Allowed when: - pre-dispatch health gate fails - installed runtime drift detected - SF cannot dispatch safely - repo workflow state corrupted Policy: configurable per project. Options: `auto`, `ask`, `log-only`. ### 3.6 `research` Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents. --- ## 4. Run Control | Value | Behavior | |-------|----------| | `manual` | User drives every step. Tool calls require approval. | | `assisted` | SF executes one unit, then pauses for user review. | | `autonomous` | SF continues until done, blocked, interrupted, budget-hit, or limit-hit. | ### 4.1 Commands ```text /control manual /control assisted /control autonomous /autonomous → alias for /control autonomous /next → alias for /control assisted (one unit) /pause → pause autonomous, preserve state /stop → stop autonomous, clear state ``` ### 4.2 Transition Scopes | Scope | Behavior | |-------|----------| | `now` | Apply immediately if no tool active. Abort current tool if policy allows. | | `after-current-tool` | Finish active tool, then switch. | | `after-current-unit` | Finish current SF unit, then switch. | | `next-milestone` | Switch after current milestone completes. | Autonomous changes affect future decisions, never mutate active tool calls mid-execution. ### 4.3 Transition Logging Every transition persists: ```json { "timestamp": "2026-05-08T10:00:00Z", "from": {"workMode": "build", "runControl": "autonomous"}, "to": {"workMode": "repair", "runControl": "autonomous"}, "reason": "pre-dispatch health gate failed", "scope": "after-current-unit", "sessionId": "..." } ``` --- ## 5. Permission Profiles | Profile | Description | |---------|-------------| | `restricted` | Read-only and explicitly allowlisted actions. | | `normal` | Safe edits, non-destructive local commands. | | `trusted` | Build/test/install/local commits and bounded repo automation. | | `unrestricted` | High-risk orchestration only in intentionally trusted environments. | ### 5.1 Enforcement Permission profile is enforced at three layers: 1. **Tool registry:** Each tool declares required profile. Tools below profile are hidden from model. 2. **Execution gate:** Each tool call checks profile at invocation. Violation = error. 3. **Safety harness:** Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile. ### 5.2 Commands ```text /permission-profile restricted /permission-profile normal /permission-profile trusted /permission-profile unrestricted ``` --- ## 6. Model Modes | Mode | Use Case | Routing Hint | |------|----------|--------------| | `fast` | Small bounded tasks | Cheapest available model | | `smart` | Default balanced work | Default routing table | | `deep` | Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) | `modelMode` guides routing. It does not replace explicit `/model` selection. --- ## 7. Mode Switching UX ### 7.1 Direct Commands ```text /mode chat /mode plan /mode build /mode review /mode repair /mode research /control manual /control assisted /control autonomous /permission-profile restricted /permission-profile normal /permission-profile trusted /permission-profile unrestricted /model-mode fast /model-mode smart /model-mode deep ``` ### 7.2 Combined Forms ```text /mode repair --autonomous --permission-profile normal /mode build --autonomous --permission-profile trusted /mode research --autonomous --permission-profile restricted --model-mode deep ``` ### 7.3 Autonomous Steering ```text /steer mode repair /steer mode review after-current-unit /steer permission-profile restricted now /steer model-mode deep for-next-unit ``` ### 7.4 Keyboard Shortcuts | Shortcut | Action | |----------|--------| | `Ctrl+Shift+M` | Cycle workMode: chat → plan → build → review → repair → research → chat | | `Ctrl+Shift+A` | Set runControl to autonomous | | `Ctrl+Shift+S` | Set runControl to assisted (step) | | `Ctrl+Shift+I` | Set runControl to manual (interactive) | | `Ctrl+Shift+R` | Set workMode to repair | | `Ctrl+Shift+P` | Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted | --- ## 8. Status and Mode Badge ### 8.1 Full Status Line ```text SF build | autonomous | trusted | smart ``` ### 8.2 Compact Badge Form For narrow terminals (< 80 cols): ```text [B][A][T][S] ``` ### 8.3 Critical State Labels When workMode is `repair` or `review`, show full labels regardless of width: ```text repair | autonomous | normal | smart review | assisted | normal | deep ``` ### 8.4 Badge Placement | Surface | Placement | |---------|-----------| | TUI header | Left side, after "SF" logo | | TUI status bar | Bottom line when header hidden | | tmux/terminal title | `SF[build|A|trusted|smart] project-name` | | Web | Top bar, color-coded chip | ### 8.5 Badge During Auto Mode Current code hides header/footer during auto mode (`if (isAutoActive()) return []`). This must change: - Show **minimal header** during auto: badge + project name only - Or show badge in **dedicated status bar** separate from header/footer - Badge color pulses slowly during autonomous execution (subtle animation) ### 8.6 Badge Colors | Axis | Value | Color | |------|-------|-------| | workMode | `chat` | dim | | workMode | `plan` | accent | | workMode | `build` | success | | workMode | `review` | warning | | workMode | `repair` | error | | workMode | `research` | info | | runControl | `manual` | dim | | runControl | `assisted` | warning | | runControl | `autonomous` | success (pulsing) | | permissionProfile | `restricted` | success | | permissionProfile | `normal` | dim | | permissionProfile | `trusted` | warning | | permissionProfile | `unrestricted` | error | --- ## 9. Background Work Surface (`/tasks`) Unified view of all background work. Replaces scattered `/status`, `/queue`, `/parallel status` for work inspection. ### 9.1 What `/tasks` Shows - autonomous task lifecycle rows - parallel workers - scheduled autonomous dispatches and queued scheduler rows - background shell sessions - stuck or resumable sessions - remote questions waiting for answers - current cost/budget state - last checkpoint and next action ### 9.2 Data Model SQLite tables: ```sql -- Durable task state CREATE TABLE tasks ( id TEXT PRIMARY KEY, work_mode TEXT NOT NULL, run_control TEXT NOT NULL, permission_profile TEXT NOT NULL, model_mode TEXT NOT NULL, status TEXT NOT NULL, -- todo | running | verifying | reviewing | done | blocked | paused | failed | cancelled | retrying dependency_blockers TEXT, -- JSON array of task IDs retry_count INTEGER DEFAULT 0, max_retries INTEGER DEFAULT 3, checkpoint_ref TEXT, -- git ref or patch file cost_budget REAL, cost_spent REAL DEFAULT 0, created_at TEXT, -- Temporal.Instant ISO started_at TEXT, completed_at TEXT, next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50" ); -- Scheduler state is separate from task lifecycle state CREATE TABLE task_scheduler ( task_id TEXT PRIMARY KEY REFERENCES tasks(id), status TEXT NOT NULL, -- queued | due | claimed | dispatched | consumed | expired due_at TEXT, claimed_by TEXT, dispatched_at TEXT, consumed_at TEXT, expires_at TEXT ); -- Ephemeral running state CREATE TABLE task_runtime ( task_id TEXT PRIMARY KEY REFERENCES tasks(id), process_pid INTEGER, worktree_path TEXT, current_model TEXT, context_usage_percent REAL, last_heartbeat_at TEXT, -- Temporal.Instant FOREIGN KEY (task_id) REFERENCES tasks(id) ); -- Transition log CREATE TABLE task_transitions ( id INTEGER PRIMARY KEY AUTOINCREMENT, task_id TEXT NOT NULL, from_status TEXT NOT NULL, to_status TEXT NOT NULL, reason TEXT, scope TEXT, timestamp TEXT NOT NULL -- Temporal.Instant ); ``` Parallel workers must stay worktree-isolated and report heartbeat/status into `.sf` state. Scheduler rows may use `queued`; task lifecycle rows use `todo`. ### 9.3 Complementary Commands `/tasks` does not replace: - `/status` → project health dashboard - `/queue` → milestone/slice dispatch order - `/parallel status` → parallel orchestrator detail - `/session-report` → cost/token summary - `/logs` → activity logs - `/forensics` → execution forensics --- ## 10. Skills System ### 10.1 Directory Structure ```text .agents/skills// SKILL.md -- skill definition with YAML frontmatter scripts/ -- supporting scripts schemas/ -- JSON schemas for inputs/outputs checklists/ -- verification checklists mcp.json -- MCP server config if applicable ``` ### 10.2 Skill Frontmatter ```yaml --- name: forge-command-surface description: Use when changing SF slash commands, browser command parity, or headless command dispatch. user-invocable: true model-invocable: true side-effects: code-edits permission-profile: normal --- ``` Fields: - `name`: unique identifier - `description`: when to use this skill - `user-invocable`: can user explicitly invoke? - `model-invocable`: can model auto-invoke when relevant? - `side-effects`: `none`, `code-edits`, `production-mutation`, etc. - `permission-profile`: minimum profile required ### 10.3 Skill Categories | Type | Example | `model-invocable` | |------|---------|-------------------| | Background knowledge | `forge-autonomous-runtime` | true | | User tool | `production-deploy` | false | | Shared capability | `forge-command-surface` | true | Dangerous skills (`production-mutation`) are never model-invoked by default. ### 10.4 Auto-Creation Flow 1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules) 2. Propose skill in manual/restricted contexts 3. Generate/update automatically only when policy allows 4. Record repeated source evidence in `.sf` state 5. Keep narrow, linted, and evaled like code 6. Commit with repo when accepted ### 10.5 Skill Eval Cases Every auto-created skill needs eval cases: ```text .agents/skills//evals/ case-1/ task.md -- user-like prompt grader.js -- deterministic checker hidden/ -- reference answers (not visible to agent) work/ -- agent workspace ``` Graders inspect: files, artifacts, `answer.json`, `trace.jsonl`, result state. Failed trials preserve workspace for debugging. --- ## 11. Command Surfaces ### 11.1 Human Slash Commands SF registers direct command roots only: ```text /status /autonomous /doctor /rate /session-report /parallel /remote /tasks ``` `/sf` is not a command root. TUI and browser command parity tests reject it so compatibility shims do not grow back. `/remote` is a full-session steering surface. Remote answers may change `workMode`, `runControl`, `permissionProfile`, and `modelMode`; they are not limited to question delivery. ### 11.2 Shell Surface Machine surface remains prefixed: ```text sf headless autonomous sf headless --autonomous ... ``` The shell prefix is the executable name, not an interactive slash-command namespace. --- ## 12. Runtime Target: Node 26 ### 12.1 Policy ```text current compatibility floor: Node 26.1+ internal target runtime: Node 26.1 canonical baseline: Node 26.1 Node 25: skip except quick probes ``` ### 12.2 Why Node 26 - `Temporal` enabled by default - V8 14.6 baseline - Undici 8 HTTP/fetch baseline - Removes legacy APIs, hardens against old assumptions ### 12.3 Temporal Adoption Store semantic type, not just formatted string: | Concept | Temporal Type | Use Case | |---------|---------------|----------| | Exact instant | `Temporal.Instant` | Journal events, checkpoints, lock leases | | Local time | `Temporal.ZonedDateTime` | Reminders, schedules, audits | | Calendar date | `Temporal.PlainDate` | Daily reports, milestone reviews | | Wall-clock time | `Temporal.PlainTime` | Recurring policies | | Time amount | `Temporal.Duration` | Budgets, leases, cooldowns, retry delays | ### 12.4 Adoption Priority 1. `sf schedule` — highest user-visible impact 2. Lock/lease — highest operational correctness 3. Journals/traces — highest debugging impact 4. Session reports — nice to have 5. Background tasks — future work ### 12.5 Gate ```text node@26 --version npm run lint npm run typecheck:extensions npm run build npm test sf --version sf --help sf --print "ping" ``` --- ## 13. Implementation Pull-Through ### 13.1 Already Directionally Right - UOK lifecycle records carry `runControl` - UOK lifecycle records carry `permissionProfile` - Schedule command state uses `autonomous_dispatch` - DB-backed state, recovery, verification, scheduling, captures, forensics - Skills and project-specific skill paths exist - Parallel orchestration and remote-question infrastructure ### 13.2 Still Needed | Priority | Item | Effort | |----------|------|--------| | P2 | Decide whether `sandboxProfile` becomes a sixth persisted axis | Medium | | P2 | Remove `/sf` from docs/web/tests (Phase 2 deprecation) | Small | ### 13.4 Recently Completed (This Session) | Priority | Item | Status | |----------|------|--------| | P1 | Centralized metrics system (`metrics-central.js`) | ✓ | | P1 | Cost command (`/cost`) with DB + ledger queries | ✓ | | P1 | Explicit stage commands (`/research`, `/plan`, `/implement`) | ✓ | | P2 | Reasoning assist foundation (`reasoning-assist.js`) | ✓ | | P2 | Self-feedback → workMode auto-transition | ✓ | | P2 | UOK events carry workMode + modelMode | ✓ | | P2 | `/sf` prefix deprecation warning (Phase 1) | ✓ | ### 13.3 Completed | Priority | Item | Status | |----------|------|--------| | P0 | Make mode state durable in SQLite | ✓ | | P0 | Add direct `/mode`, `/control`, `/permission-profile`, `/model-mode` commands | ✓ | | P0 | Add visible mode badge to TUI header/status bar | ✓ | | P1 | Make `--autonomous` chain into direct `/autonomous` | ✓ | | P1 | Expose autonomous continuation limits in settings and status | ✓ | | P1 | Add `/tasks` backed by DB execution graph state | ✓ | | P1 | Make `repair` first-class workflow over `doctor` | ✓ | | P1 | Enhanced `/steer` with mode/permission-profile/model-mode transitions | ✓ | | P1 | TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P) | ✓ | | P1 | Minimal auto-mode header/footer (badge visible during autonomy) | ✓ | | P1 | Remove `/sf` namespace registration and parity-test against fallback | ✓ | | P1 | Parallel worker intent/claim registry backed by UOK SQLite coordination | ✓ | | P1 | Skill eval harness foundation | ✓ | | P1 | Terminal title mode indicator | ✓ | | P2 | Policy-aware project skill suggestion/generation with DB cooldown | ✓ | | P2 | Schema-backed task frontmatter (risk, mutation, verification, approval) | ✓ | | P2 | Subagent provider/model/permission inheritance audit and guard | ✓ | | P2 | Remote steering as full-session surface from remote answers | ✓ | --- ## 14. Open Questions 1. Should `plan` mode show badge `[P]` or `plan` text in full? 2. Should paused autonomous show previous badge dimmed, or `[P]` for paused? 3. Should mode be per-session or per-project? (Current: per-session) 4. Should badge appear in tmux/terminal window titles? 5. Should mode transitions have sound/notification? 6. Should `repair` auto-transition be `ask` by default for new projects? 7. Should skill eval cases run in CI or only on-demand? 8. Should `/tasks` be a TUI overlay or a separate scrollable panel? 9. Should reasoning assist call a fast model automatically, or only prepare prompts for now? --- ## 15. References - GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — - Factory Droid, "Autonomy Level" — - Amp manual — - Smelt (mode cycling) — - ORCH (task state machine) — - AgentPlane (schema-first tasks) — - Relay (channels and tickets) — - Sage (runtime-neutral orchestration) — - Wit (symbol-level locks) —