# Product Sense ## The Core Thesis Autonomous execution is the end gate. SF exists to take a multi-phase software project — a milestone with slices and tasks — and run it to completion without human intervention, producing a clean git history, passing tests, and a deployable artifact. Every design decision should be evaluated against this question: **does it make autonomous execution more reliable, more observable, or more recoverable?** ## User Goals - Hand off a milestone and have it complete without babysitting - Know the agent won't make irreversible mistakes (write gates, protected files, budget ceilings) - Resume after a crash without losing work (state-on-disk, crash recovery) - See what the agent did and why (trace files, decision register, records keeper) - Steer mid-run without breaking the loop (message queue, steering gate) ## Non-Goals - Being a chat interface — use the Pi interactive mode for exploratory conversation - Replacing CI — SF triggers verification but does not replace your existing CI pipeline - Working without context — SF needs a spec, a roadmap, and a task plan; it does not invent work from nothing ## What Good Product Judgment Looks Like **Fresh context per unit, not accumulated context.** Each task gets a new session with exactly the context it needs pre-injected (task plan, slice plan, prior summaries, relevant skills). This prevents quality degradation from context accumulation — one of the primary failure modes of naive LLM agents on long projects. **State machine, not LLM guessing.** The loop is deterministic: read STATE.md → validate → dispatch → post-unit → verify → advance. The LLM executes work inside a unit; it does not decide what the next unit is. Separating orchestration from execution keeps the system predictable. **Spec-first.** No behavior change without a failing test first. No completion without a real consumer. This is the iron law — not a suggestion. An agent that completes tasks without specs is just making things up. **Crash recovery must be invisible.** A crashed session should resume within seconds with no visible data loss. If recovery requires human intervention, it is a product failure. **User stays in the loop via gates, not via interrupts.** Discussion gates, write gates, budget ceilings, and approval prompts are the designed points of human interaction. The agent should not need to ask for help in the middle of a task. ## Tradeoffs | Choice | What we gave up | Why | |--------|----------------|-----| | Fresh session per unit | Conversational continuity across units | Quality and predictability over convenience | | State on disk (not in memory) | Speed of in-memory state | Crash recovery and multi-process visibility | | Write gate during queue | Faster iteration in planning | Safety: prevents accidental file mutations during discussion | | Protected files (ADRs, SPEC.md) | Agent autonomy over architecture docs | Human oversight over durable decisions | | Serial execution default | Throughput | Correctness before parallelism; parallel locking is deferred debt |