singularity-forge/docs/dev/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md

83 lines
3.9 KiB
Markdown
Raw Permalink Normal View History

# System Prompt & LLM vs Deterministic Split
### The Core Separation Principle
> If you could write an if-else statement that handles it correctly every time, **it should not be in the LLM's context**. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.
### What the LLM Owns
| Capability | Why LLM |
|-----------|---------|
| Understanding intent | Interpretation, judgment |
| Architectural reasoning | Weighing tradeoffs |
| Code generation | Creative, context-dependent |
| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
| Self-critique & quality assessment | Judgment calls |
### What TypeScript/Deterministic Code Owns
| Capability | Why Deterministic |
|-----------|-------------------|
| State machine transitions | Typed state object, no ambiguity |
| Context assembly | Predict + pre-load what agent needs |
| File operations | Validate paths, handle encoding, manage permissions |
| Test execution & result parsing | Structured results, not raw terminal output |
| Build & environment management | Install deps, start servers, manage ports |
| Code formatting | Run prettier automatically, never waste LLM tokens |
| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
| Summarization triggers | Mechanical workflow, LLM provides content |
### Modular System Prompt Architecture
```
Base Layer (always present, ~500 tokens)
→ Identity, core behavioral rules, general approach
Phase-Specific Layer (swapped based on state)
→ Planning mode: decomposition, interfaces, risks
→ Execution mode: implementation, testing, iteration
→ Debugging mode: diagnosis, hypothesis testing, isolation
Task-Specific Layer (assembled fresh per task)
→ Current spec, acceptance criteria, relevant contracts, prior attempts
Tools Layer
→ Available tool definitions and parameters
```
### Tool Design Philosophy
> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.
**Bad:** LLM calls `readFile``parseJSON``runCommand` (3 calls, 3 failure points)
**Good:** LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)
### Essential Tools
| Tool | Returns |
|------|---------|
| `runTests` | Structured results: pass count, fail count, per-failure details |
| `readFiles` | Batched file contents (array of paths, not one at a time) |
| `writeFile` | Auto-formats before writing |
| `searchCodebase` | Grep-like results with file paths and line numbers |
| `getProjectState` | Manifest + current task spec + related task statuses |
| `updateTaskStatus` | Handles downstream state updates automatically |
| `buildProject` | Structured errors with file paths and line numbers |
| `browserCheck` | Screenshot or structured description of rendered output |
| `commitChanges` | Enforces conventions, runs pre-commit hooks |
| `revertToCheckpoint` | Rolls back to last known good state |
### Prompt Patterns That Maximize Agency
1. **Tell it what it CAN do, not what it can't.** "Full authority as long as acceptance criteria and tests pass."
2. **Explicit permission to iterate.** "First attempt doesn't need to be perfect. Write, run, observe, improve."
3. **Clear exit conditions.** Concrete, measurable, unambiguous definition of "done."
4. **Built-in scratchpad.** "Write reasoning in thinking blocks. Track attempts and outcomes."
5. **Recovery protocol.** "After 3 failed approaches, produce structured escalation."
### The Meta-Principle
> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. **Neither should do the other's job.** When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.
---