Files added by PR #2008 that were not in main were dropped during the merge. Restore all src/, docs/, and scripts/ files from the pre-merge PR head. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.3 KiB
3.3 KiB
Cross-Cutting Themes (Where All 4 Models Converge)
Original Themes (Reinforced)
These ideas appeared independently in all four conversations across both rounds, indicating the highest-confidence principles:
- The LLM should only do what requires judgment. Everything deterministic belongs in code.
- Vertical slices are non-negotiable. End-to-end working increments at every stage.
- Context leanness = quality. Less (but more relevant) context produces better outputs than more context.
- Execution-based verification beats self-assessment. Run the code. Trust test results over the model's opinion.
- The orchestrator is the product. The model is a commodity; the system around it is the differentiator.
- State must be structured and deterministic. Never let the LLM manage its own lifecycle or memory.
- Speed comes from removing unnecessary work. Not from doing the same work faster.
- Failure recovery matters more than happy-path perfection. Design the error paths first.
- Human involvement should be high-leverage. Specific questions with context, not open-ended reviews.
- The system improves over time. Track patterns, cache solutions, learn from failures.
New Themes (From Grey Area Deep-Dives)
- Document assumptions, don't ask about every one. Proceed with sensible defaults + transparent logging. Review at milestones, not in real-time.
- The codebase is the lossless source of truth. Summaries are lossy caches that must be periodically reconciled against actual code. Never summarize summaries.
- Semantic conflicts are harder than syntactic ones. Interface contracts must be behavioral specs, not just type signatures. Integration testing is a first-class concern, not an afterthought.
- Observe before modifying. Especially in legacy codebases — the agent must understand existing patterns before changing them. Preserve local consistency over global ideals.
- Taste can be ~80-85% automated. Convert subjective preferences to concrete, verifiable specs. Reserve human judgment for the remaining gestalt. The gap is closing fast with vision-capable models.
- Irreversible operations are categorically different. The agent prepares; the human executes. No exceptions.
- "Boring" code is good code. For handoff, enforce standard patterns, limit complexity, and write why comments. Automated readability testing catches problems before humans encounter them.
- Make rewrites cheap, not rare. Clean interfaces + good tests + branch-based experimentation = rewriting is a safe, routine operation rather than a crisis.
- Route errors by type, not by severity. Different error classes need different context, different handlers, and different escalation thresholds. Flaky tests should be quarantined, not fixed.
- The magic is the translation layer. For non-technical users, the entire value proposition is the invisible bridge between human intent and technical execution. Every moment the user has to think like a developer is a failure.
Generated March 2026. Updated with grey-area deep-dive synthesis. Source material: two rounds of parallel deep-dive conversations with Claude (Anthropic), Gemini (Google), GPT (OpenAI), and Grok (xAI) on optimal autonomous AI coding agent architecture — including the 13 hardest unsolved problems and designing for non-technical users.