Split flat docs/ into user-docs/ (guides, config, troubleshooting) and dev/ (ADRs, architecture, extension guides, proposals). Updated docs/README.md index to reflect new paths.
3.9 KiB
System Prompt & LLM vs Deterministic Split
The Core Separation Principle
If you could write an if-else statement that handles it correctly every time, it should not be in the LLM's context. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.
What the LLM Owns
| Capability | Why LLM |
|---|---|
| Understanding intent | Interpretation, judgment |
| Architectural reasoning | Weighing tradeoffs |
| Code generation | Creative, context-dependent |
| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
| Self-critique & quality assessment | Judgment calls |
What TypeScript/Deterministic Code Owns
| Capability | Why Deterministic |
|---|---|
| State machine transitions | Typed state object, no ambiguity |
| Context assembly | Predict + pre-load what agent needs |
| File operations | Validate paths, handle encoding, manage permissions |
| Test execution & result parsing | Structured results, not raw terminal output |
| Build & environment management | Install deps, start servers, manage ports |
| Code formatting | Run prettier automatically, never waste LLM tokens |
| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
| Summarization triggers | Mechanical workflow, LLM provides content |
Modular System Prompt Architecture
Base Layer (always present, ~500 tokens)
→ Identity, core behavioral rules, general approach
Phase-Specific Layer (swapped based on state)
→ Planning mode: decomposition, interfaces, risks
→ Execution mode: implementation, testing, iteration
→ Debugging mode: diagnosis, hypothesis testing, isolation
Task-Specific Layer (assembled fresh per task)
→ Current spec, acceptance criteria, relevant contracts, prior attempts
Tools Layer
→ Available tool definitions and parameters
Tool Design Philosophy
Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.
Bad: LLM calls readFile → parseJSON → runCommand (3 calls, 3 failure points)
Good: LLM calls runTests(filter) → gets structured pass/fail with locations (1 call, clean result)
Essential Tools
| Tool | Returns |
|---|---|
runTests |
Structured results: pass count, fail count, per-failure details |
readFiles |
Batched file contents (array of paths, not one at a time) |
writeFile |
Auto-formats before writing |
searchCodebase |
Grep-like results with file paths and line numbers |
getProjectState |
Manifest + current task spec + related task statuses |
updateTaskStatus |
Handles downstream state updates automatically |
buildProject |
Structured errors with file paths and line numbers |
browserCheck |
Screenshot or structured description of rendered output |
commitChanges |
Enforces conventions, runs pre-commit hooks |
revertToCheckpoint |
Rolls back to last known good state |
Prompt Patterns That Maximize Agency
- Tell it what it CAN do, not what it can't. "Full authority as long as acceptance criteria and tests pass."
- Explicit permission to iterate. "First attempt doesn't need to be perfect. Write, run, observe, improve."
- Clear exit conditions. Concrete, measurable, unambiguous definition of "done."
- Built-in scratchpad. "Write reasoning in thinking blocks. Track attempts and outcomes."
- Recovery protocol. "After 3 failed approaches, produce structured escalation."
The Meta-Principle
Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. Neither should do the other's job. When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.