singularity-forge/docs/specs/agent-mode-system.md

662 lines
21 KiB
Markdown

# SF Agent Mode System
> **Status:** Draft specification. Promoted from `copilot-thoughts.md` research notes.
> **Scope:** TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target.
> **Decision authority:** Product + architecture review required before implementation.
---
## 1. Problem Statement
SF's old command surface treated mode switching as separate commands rather than persistent states. There was no visible indicator of the current mode, and the `/sf` prefix positioned SF as a plugin rather than the system itself.
Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.
**Goal:** Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.
---
## 2. Orthogonal State Axes
SF state is five independent axes, not one overloaded "mode."
```text
workMode: chat | plan | build | review | repair | research
runControl: manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode: fast | smart | deep
surface: tui | web | headless | rpc
```
### 2.1 Axis Definitions
| Axis | Question It Answers | Values |
|------|---------------------|--------|
| `workMode` | What kind of work is SF doing? | `chat`, `plan`, `build`, `review`, `repair`, `research` |
| `runControl` | Who advances the loop? | `manual` (user), `assisted` (one unit then pause), `autonomous` (continuous) |
| `permissionProfile` | What may proceed without approval? | `restricted`, `normal`, `trusted`, `unrestricted` |
| `modelMode` | Speed/cost/reasoning posture? | `fast` (cheap), `smart` (balanced), `deep` (reasoning) |
| `surface` | How is the user connected? | `tui`, `web`, `headless`, `rpc` |
### 2.2 Example Combinations
```text
plan | manual | normal | deep → user plans with reasoning model
build | autonomous | trusted | smart → continuous implementation
repair | assisted | normal | smart → one repair unit at a time
research | autonomous | restricted | deep → continuous research, read-only
review | manual | restricted | deep → user reviews with reasoning model
```
### 2.3 Rules
- `permissionProfile` never implies `runControl`. Autonomous run with `restricted` permissions is valid.
- `runControl` never implies `permissionProfile`. Manual run with `unrestricted` permissions is valid.
- Denylists and safety gates override `permissionProfile` regardless of value.
- Every risk decision logs all five axis values.
- `sandboxProfile` may become a sixth axis later. It is separate from `permissionProfile`: sandboxing controls process/filesystem/network containment, while permission profile controls what SF may approve.
---
## 3. Work Modes
### 3.1 `chat`
Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.
### 3.2 `plan`
Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.
**Plan → Build handoff:**
```text
plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart
```
Surfaces:
- TUI: plan acceptance prompt includes "run autonomously" button
- Web: plan acceptance button includes "run autonomously"
- Headless: `--autonomous` chains into direct `/autonomous`
- RPC: machine event records transition explicitly
### 3.3 `build`
Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.
### 3.4 `review`
Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (`deep`).
### 3.5 `repair`
Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies, failed gates, generated/runtime drift, and broken state.
**Doctor is the diagnostic engine, not the mode.** `/doctor` inspects. `/repair` switches work mode.
`repair` is a `workMode`, not a separate subsystem.
Commands:
```text
/doctor → inspect and report
/doctor fix → deterministic auto-fix
/doctor heal → LLM-assisted deep healing
/repair → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit
```
**Auto-transition to repair:**
```text
build | autonomous | trusted | smart
→ repair | autonomous | normal | smart
```
Allowed when:
- pre-dispatch health gate fails
- installed runtime drift detected
- SF cannot dispatch safely
- repo workflow state corrupted
Policy: configurable per project. Options: `auto`, `ask`, `log-only`.
### 3.6 `research`
Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.
---
## 4. Run Control
| Value | Behavior |
|-------|----------|
| `manual` | User drives every step. Tool calls require approval. |
| `assisted` | SF executes one unit, then pauses for user review. |
| `autonomous` | SF continues until done, blocked, interrupted, budget-hit, or limit-hit. |
### 4.1 Commands
```text
/control manual
/control assisted
/control autonomous
/autonomous → alias for /control autonomous
/next → alias for /control assisted (one unit)
/pause → pause autonomous, preserve state
/stop → stop autonomous, clear state
```
### 4.2 Transition Scopes
| Scope | Behavior |
|-------|----------|
| `now` | Apply immediately if no tool active. Abort current tool if policy allows. |
| `after-current-tool` | Finish active tool, then switch. |
| `after-current-unit` | Finish current SF unit, then switch. |
| `next-milestone` | Switch after current milestone completes. |
Autonomous changes affect future decisions, never mutate active tool calls mid-execution.
### 4.3 Transition Logging
Every transition persists:
```json
{
"timestamp": "2026-05-08T10:00:00Z",
"from": {"workMode": "build", "runControl": "autonomous"},
"to": {"workMode": "repair", "runControl": "autonomous"},
"reason": "pre-dispatch health gate failed",
"scope": "after-current-unit",
"sessionId": "..."
}
```
---
## 5. Permission Profiles
| Profile | Description |
|---------|-------------|
| `restricted` | Read-only and explicitly allowlisted actions. |
| `normal` | Safe edits, non-destructive local commands. |
| `trusted` | Build/test/install/local commits and bounded repo automation. |
| `unrestricted` | High-risk orchestration only in intentionally trusted environments. |
### 5.1 Enforcement
Permission profile is enforced at three layers:
1. **Tool registry:** Each tool declares required profile. Tools below profile are hidden from model.
2. **Execution gate:** Each tool call checks profile at invocation. Violation = error.
3. **Safety harness:** Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.
### 5.2 Commands
```text
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
```
---
## 6. Model Modes
| Mode | Use Case | Routing Hint |
|------|----------|--------------|
| `fast` | Small bounded tasks | Cheapest available model |
| `smart` | Default balanced work | Default routing table |
| `deep` | Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) |
`modelMode` guides routing. It does not replace explicit `/model` selection.
---
## 7. Mode Switching UX
### 7.1 Direct Commands
```text
/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
/model-mode fast
/model-mode smart
/model-mode deep
```
### 7.2 Combined Forms
```text
/mode repair --autonomous --permission-profile normal
/mode build --autonomous --permission-profile trusted
/mode research --autonomous --permission-profile restricted --model-mode deep
```
### 7.3 Autonomous Steering
```text
/steer mode repair
/steer mode review after-current-unit
/steer permission-profile restricted now
/steer model-mode deep for-next-unit
```
### 7.4 Keyboard Shortcuts
| Shortcut | Action |
|----------|--------|
| `Ctrl+Shift+M` | Cycle workMode: chat → plan → build → review → repair → research → chat |
| `Ctrl+Shift+A` | Set runControl to autonomous |
| `Ctrl+Shift+S` | Set runControl to assisted (step) |
| `Ctrl+Shift+I` | Set runControl to manual (interactive) |
| `Ctrl+Shift+R` | Set workMode to repair |
| `Ctrl+Shift+P` | Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted |
---
## 8. Status and Mode Badge
### 8.1 Full Status Line
```text
SF build | autonomous | trusted | smart
```
### 8.2 Compact Badge Form
For narrow terminals (< 80 cols):
```text
[B][A][T][S]
```
### 8.3 Critical State Labels
When workMode is `repair` or `review`, show full labels regardless of width:
```text
repair | autonomous | normal | smart
review | assisted | normal | deep
```
### 8.4 Badge Placement
| Surface | Placement |
|---------|-----------|
| TUI header | Left side, after "SF" logo |
| TUI status bar | Bottom line when header hidden |
| tmux/terminal title | `SF[build|A|trusted|smart] project-name` |
| Web | Top bar, color-coded chip |
### 8.5 Badge During Auto Mode
Current code hides header/footer during auto mode (`if (isAutoActive()) return []`). This must change:
- Show **minimal header** during auto: badge + project name only
- Or show badge in **dedicated status bar** separate from header/footer
- Badge color pulses slowly during autonomous execution (subtle animation)
### 8.6 Badge Colors
| Axis | Value | Color |
|------|-------|-------|
| workMode | `chat` | dim |
| workMode | `plan` | accent |
| workMode | `build` | success |
| workMode | `review` | warning |
| workMode | `repair` | error |
| workMode | `research` | info |
| runControl | `manual` | dim |
| runControl | `assisted` | warning |
| runControl | `autonomous` | success (pulsing) |
| permissionProfile | `restricted` | success |
| permissionProfile | `normal` | dim |
| permissionProfile | `trusted` | warning |
| permissionProfile | `unrestricted` | error |
---
## 9. Background Work Surface (`/tasks`)
Unified view of all background work. Replaces scattered `/status`, `/queue`, `/parallel status` for work inspection.
### 9.1 What `/tasks` Shows
- autonomous task lifecycle rows
- parallel workers
- scheduled autonomous dispatches and queued scheduler rows
- background shell sessions
- stuck or resumable sessions
- remote questions waiting for answers
- current cost/budget state
- last checkpoint and next action
### 9.2 Data Model
SQLite tables:
```sql
-- Durable task state
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
work_mode TEXT NOT NULL,
run_control TEXT NOT NULL,
permission_profile TEXT NOT NULL,
model_mode TEXT NOT NULL,
status TEXT NOT NULL, -- todo | running | verifying | reviewing | done | blocked | paused | failed | cancelled | retrying
dependency_blockers TEXT, -- JSON array of task IDs
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
checkpoint_ref TEXT, -- git ref or patch file
cost_budget REAL,
cost_spent REAL DEFAULT 0,
created_at TEXT, -- Temporal.Instant ISO
started_at TEXT,
completed_at TEXT,
next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);
-- Scheduler state is separate from task lifecycle state
CREATE TABLE task_scheduler (
task_id TEXT PRIMARY KEY REFERENCES tasks(id),
status TEXT NOT NULL, -- queued | due | claimed | dispatched | consumed | expired
due_at TEXT,
claimed_by TEXT,
dispatched_at TEXT,
consumed_at TEXT,
expires_at TEXT
);
-- Ephemeral running state
CREATE TABLE task_runtime (
task_id TEXT PRIMARY KEY REFERENCES tasks(id),
process_pid INTEGER,
worktree_path TEXT,
current_model TEXT,
context_usage_percent REAL,
last_heartbeat_at TEXT, -- Temporal.Instant
FOREIGN KEY (task_id) REFERENCES tasks(id)
);
-- Transition log
CREATE TABLE task_transitions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL,
from_status TEXT NOT NULL,
to_status TEXT NOT NULL,
reason TEXT,
scope TEXT,
timestamp TEXT NOT NULL -- Temporal.Instant
);
```
Parallel workers must stay worktree-isolated and report heartbeat/status into
`.sf` state. Scheduler rows may use `queued`; task lifecycle rows use `todo`.
### 9.3 Complementary Commands
`/tasks` does not replace:
- `/status` project health dashboard
- `/queue` milestone/slice dispatch order
- `/parallel status` parallel orchestrator detail
- `/session-report` cost/token summary
- `/logs` activity logs
- `/forensics` execution forensics
---
## 10. Skills System
### 10.1 Directory Structure
```text
.agents/skills/<skill-name>/
SKILL.md -- skill definition with YAML frontmatter
scripts/ -- supporting scripts
schemas/ -- JSON schemas for inputs/outputs
checklists/ -- verification checklists
mcp.json -- MCP server config if applicable
```
### 10.2 Skill Frontmatter
```yaml
---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---
```
Fields:
- `name`: unique identifier
- `description`: when to use this skill
- `user-invocable`: can user explicitly invoke?
- `model-invocable`: can model auto-invoke when relevant?
- `side-effects`: `none`, `code-edits`, `production-mutation`, etc.
- `permission-profile`: minimum profile required
### 10.3 Skill Categories
| Type | Example | `model-invocable` |
|------|---------|-------------------|
| Background knowledge | `forge-autonomous-runtime` | true |
| User tool | `production-deploy` | false |
| Shared capability | `forge-command-surface` | true |
Dangerous skills (`production-mutation`) are never model-invoked by default.
### 10.4 Auto-Creation Flow
1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
2. Propose skill in manual/restricted contexts
3. Generate/update automatically only when policy allows
4. Record repeated source evidence in `.sf` state
5. Keep narrow, linted, and evaled like code
6. Commit with repo when accepted
### 10.5 Skill Eval Cases
Every auto-created skill needs eval cases:
```text
.agents/skills/<skill-name>/evals/
case-1/
task.md -- user-like prompt
grader.js -- deterministic checker
hidden/ -- reference answers (not visible to agent)
work/ -- agent workspace
```
Graders inspect: files, artifacts, `answer.json`, `trace.jsonl`, result state.
Failed trials preserve workspace for debugging.
---
## 11. Command Surfaces
### 11.1 Human Slash Commands
SF registers direct command roots only:
```text
/status
/autonomous
/doctor
/rate
/session-report
/parallel
/remote
/tasks
```
`/sf` is not a command root. TUI and browser command parity tests reject it so
compatibility shims do not grow back.
`/remote` is a full-session steering surface. Remote answers may change
`workMode`, `runControl`, `permissionProfile`, and `modelMode`; they are not
limited to question delivery.
### 11.2 Shell Surface
Machine surface remains prefixed:
```text
sf headless autonomous
sf headless --autonomous ...
```
The shell prefix is the executable name, not an interactive slash-command
namespace.
---
## 12. Runtime Target: Node 26
### 12.1 Policy
```text
current compatibility floor: Node 26.1+
internal target runtime: Node 26.1
canonical baseline: Node 26.1
Node 25: skip except quick probes
```
### 12.2 Why Node 26
- `Temporal` enabled by default
- V8 14.6 baseline
- Undici 8 HTTP/fetch baseline
- Removes legacy APIs, hardens against old assumptions
### 12.3 Temporal Adoption
Store semantic type, not just formatted string:
| Concept | Temporal Type | Use Case |
|---------|---------------|----------|
| Exact instant | `Temporal.Instant` | Journal events, checkpoints, lock leases |
| Local time | `Temporal.ZonedDateTime` | Reminders, schedules, audits |
| Calendar date | `Temporal.PlainDate` | Daily reports, milestone reviews |
| Wall-clock time | `Temporal.PlainTime` | Recurring policies |
| Time amount | `Temporal.Duration` | Budgets, leases, cooldowns, retry delays |
### 12.4 Adoption Priority
1. `sf schedule` highest user-visible impact
2. Lock/lease highest operational correctness
3. Journals/traces highest debugging impact
4. Session reports nice to have
5. Background tasks future work
### 12.5 Gate
```text
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
```
---
## 13. Implementation Pull-Through
### 13.1 Already Directionally Right
- UOK lifecycle records carry `runControl`
- UOK lifecycle records carry `permissionProfile`
- Schedule command state uses `autonomous_dispatch`
- DB-backed state, recovery, verification, scheduling, captures, forensics
- Skills and project-specific skill paths exist
- Parallel orchestration and remote-question infrastructure
### 13.2 Still Needed
| Priority | Item | Effort |
|----------|------|--------|
| P2 | Decide whether `sandboxProfile` becomes a sixth persisted axis | Medium |
| P2 | Remove `/sf` from docs/web/tests (Phase 2 deprecation) | Small |
### 13.4 Recently Completed (This Session)
| Priority | Item | Status |
|----------|------|--------|
| P1 | Centralized metrics system (`metrics-central.js`) | |
| P1 | Cost command (`/cost`) with DB + ledger queries | |
| P1 | Explicit stage commands (`/research`, `/plan`, `/implement`) | |
| P2 | Reasoning assist foundation (`reasoning-assist.js`) | |
| P2 | Self-feedback workMode auto-transition | |
| P2 | UOK events carry workMode + modelMode | |
| P2 | `/sf` prefix deprecation warning (Phase 1) | |
### 13.3 Completed
| Priority | Item | Status |
|----------|------|--------|
| P0 | Make mode state durable in SQLite | |
| P0 | Add direct `/mode`, `/control`, `/permission-profile`, `/model-mode` commands | |
| P0 | Add visible mode badge to TUI header/status bar | |
| P1 | Make `--autonomous` chain into direct `/autonomous` | |
| P1 | Expose autonomous continuation limits in settings and status | |
| P1 | Add `/tasks` backed by DB execution graph state | |
| P1 | Make `repair` first-class workflow over `doctor` | |
| P1 | Enhanced `/steer` with mode/permission-profile/model-mode transitions | |
| P1 | TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P) | |
| P1 | Minimal auto-mode header/footer (badge visible during autonomy) | |
| P1 | Remove `/sf` namespace registration and parity-test against fallback | |
| P1 | Parallel worker intent/claim registry backed by UOK SQLite coordination | |
| P1 | Skill eval harness foundation | |
| P1 | Terminal title mode indicator | |
| P2 | Policy-aware project skill suggestion/generation with DB cooldown | |
| P2 | Schema-backed task frontmatter (risk, mutation, verification, approval) | |
| P2 | Subagent provider/model/permission inheritance audit and guard | |
| P2 | Remote steering as full-session surface from remote answers | |
---
## 14. Open Questions
1. Should `plan` mode show badge `[P]` or `plan` text in full?
2. Should paused autonomous show previous badge dimmed, or `[P]` for paused?
3. Should mode be per-session or per-project? (Current: per-session)
4. Should badge appear in tmux/terminal window titles?
5. Should mode transitions have sound/notification?
6. Should `repair` auto-transition be `ask` by default for new projects?
7. Should skill eval cases run in CI or only on-demand?
8. Should `/tasks` be a TUI overlay or a separate scrollable panel?
9. Should reasoning assist call a fast model automatically, or only prepare prompts for now?
---
## 15. References
- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" <https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot>
- Factory Droid, "Autonomy Level" — <https://docs.factory.ai/cli/user-guides/auto-run>
- Amp manual — <https://ampcode.com/manual>
- Smelt (mode cycling) — <https://github.com/leonardcser/smelt>
- ORCH (task state machine) — <https://github.com/oxgeneral/ORCH>
- AgentPlane (schema-first tasks) — <https://github.com/basilisk-labs/agentplane>
- Relay (channels and tickets) — <https://github.com/jcast90/relay>
- Sage (runtime-neutral orchestration) — <https://github.com/youwangd/SageCLI>
- Wit (symbol-level locks) — <https://github.com/amaar-mc/wit>