singularity-forge/docs/dev/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md

# System Prompt & LLM vs Deterministic Split

### The Core Separation Principle

> If you could write an if-else statement that handles it correctly every time, **it should not be in the LLM's context**. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.

### What the LLM Owns

| Capability | Why LLM |
|-----------|---------|
| Understanding intent | Interpretation, judgment |
| Architectural reasoning | Weighing tradeoffs |
| Code generation | Creative, context-dependent |
| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
| Self-critique & quality assessment | Judgment calls |

### What TypeScript/Deterministic Code Owns

| Capability | Why Deterministic |
|-----------|-------------------|
| State machine transitions | Typed state object, no ambiguity |
| Context assembly | Predict + pre-load what agent needs |
| File operations | Validate paths, handle encoding, manage permissions |
| Test execution & result parsing | Structured results, not raw terminal output |
| Build & environment management | Install deps, start servers, manage ports |
| Code formatting | Run prettier automatically, never waste LLM tokens |
| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
| Summarization triggers | Mechanical workflow, LLM provides content |

### Modular System Prompt Architecture

```
Base Layer (always present, ~500 tokens)
  → Identity, core behavioral rules, general approach
  
Phase-Specific Layer (swapped based on state)
  → Planning mode: decomposition, interfaces, risks
  → Execution mode: implementation, testing, iteration
  → Debugging mode: diagnosis, hypothesis testing, isolation

Task-Specific Layer (assembled fresh per task)
  → Current spec, acceptance criteria, relevant contracts, prior attempts

Tools Layer
  → Available tool definitions and parameters
```

### Tool Design Philosophy

> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.

**Bad:** LLM calls `readFile` → `parseJSON` → `runCommand` (3 calls, 3 failure points)  
**Good:** LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)

### Essential Tools

| Tool | Returns |
|------|---------|
| `runTests` | Structured results: pass count, fail count, per-failure details |
| `readFiles` | Batched file contents (array of paths, not one at a time) |
| `writeFile` | Auto-formats before writing |
| `searchCodebase` | Grep-like results with file paths and line numbers |
| `getProjectState` | Manifest + current task spec + related task statuses |
| `updateTaskStatus` | Handles downstream state updates automatically |
| `buildProject` | Structured errors with file paths and line numbers |
| `browserCheck` | Screenshot or structured description of rendered output |
| `commitChanges` | Enforces conventions, runs pre-commit hooks |
| `revertToCheckpoint` | Rolls back to last known good state |

### Prompt Patterns That Maximize Agency

1. **Tell it what it CAN do, not what it can't.** "Full authority as long as acceptance criteria and tests pass."
2. **Explicit permission to iterate.** "First attempt doesn't need to be perfect. Write, run, observe, improve."
3. **Clear exit conditions.** Concrete, measurable, unambiguous definition of "done."
4. **Built-in scratchpad.** "Write reasoning in thinking blocks. Track attempts and outcomes."
5. **Recovery protocol.** "After 3 failed approaches, produce structured escalation."

### The Meta-Principle

> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. **Neither should do the other's job.** When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.

---
fix: restore PR files lost during merge conflict resolution Files added by PR #2008 that were not in main were dropped during the merge. Restore all src/, docs/, and scripts/ files from the pre-merge PR head. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-25 22:38:55 -06:00			`# System Prompt & LLM vs Deterministic Split`

			`### The Core Separation Principle`

			`> If you could write an if-else statement that handles it correctly every time, it should not be in the LLM's context. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.`

			`### What the LLM Owns`

			`\| Capability \| Why LLM \|`
			`\|-----------\|---------\|`
			`\| Understanding intent \| Interpretation, judgment \|`
			`\| Architectural reasoning \| Weighing tradeoffs \|`
			`\| Code generation \| Creative, context-dependent \|`
			`\| Debugging & diagnosis \| Abductive reasoning, hypothesis formation \|`
			`\| Self-critique & quality assessment \| Judgment calls \|`

			`### What TypeScript/Deterministic Code Owns`

			`\| Capability \| Why Deterministic \|`
			`\|-----------\|-------------------\|`
			`\| State machine transitions \| Typed state object, no ambiguity \|`
			`\| Context assembly \| Predict + pre-load what agent needs \|`
			`\| File operations \| Validate paths, handle encoding, manage permissions \|`
			`\| Test execution & result parsing \| Structured results, not raw terminal output \|`
			`\| Build & environment management \| Install deps, start servers, manage ports \|`
			`\| Code formatting \| Run prettier automatically, never waste LLM tokens \|`
			`\| Task scheduling & dependency resolution \| Graph traversal, instant vs 5-second LLM call \|`
			`\| Summarization triggers \| Mechanical workflow, LLM provides content \|`

			`### Modular System Prompt Architecture`

			```
			`Base Layer (always present, ~500 tokens)`
			`→ Identity, core behavioral rules, general approach`

			`Phase-Specific Layer (swapped based on state)`
			`→ Planning mode: decomposition, interfaces, risks`
			`→ Execution mode: implementation, testing, iteration`
			`→ Debugging mode: diagnosis, hypothesis testing, isolation`

			`Task-Specific Layer (assembled fresh per task)`
			`→ Current spec, acceptance criteria, relevant contracts, prior attempts`

			`Tools Layer`
			`→ Available tool definitions and parameters`
			```

			`### Tool Design Philosophy`

			`> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.`

			Bad: LLM calls `readFile` → `parseJSON` → `runCommand` (3 calls, 3 failure points)
			Good: LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)

			`### Essential Tools`

			`\| Tool \| Returns \|`
			`\|------\|---------\|`
			\| `runTests` \| Structured results: pass count, fail count, per-failure details \|
			\| `readFiles` \| Batched file contents (array of paths, not one at a time) \|
			\| `writeFile` \| Auto-formats before writing \|
			\| `searchCodebase` \| Grep-like results with file paths and line numbers \|
			\| `getProjectState` \| Manifest + current task spec + related task statuses \|
			\| `updateTaskStatus` \| Handles downstream state updates automatically \|
			\| `buildProject` \| Structured errors with file paths and line numbers \|
			\| `browserCheck` \| Screenshot or structured description of rendered output \|
			\| `commitChanges` \| Enforces conventions, runs pre-commit hooks \|
			\| `revertToCheckpoint` \| Rolls back to last known good state \|

			`### Prompt Patterns That Maximize Agency`

			`1. Tell it what it CAN do, not what it can't. "Full authority as long as acceptance criteria and tests pass."`
			`2. Explicit permission to iterate. "First attempt doesn't need to be perfect. Write, run, observe, improve."`
			`3. Clear exit conditions. Concrete, measurable, unambiguous definition of "done."`
			`4. Built-in scratchpad. "Write reasoning in thinking blocks. Track attempts and outcomes."`
			`5. Recovery protocol. "After 3 failed approaches, produce structured escalation."`

			`### The Meta-Principle`

			`> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. Neither should do the other's job. When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.`

			`---`