43 lines
3.1 KiB
Markdown
43 lines
3.1 KiB
Markdown
# Product Sense
|
|
|
|
## The Core Thesis
|
|
|
|
SF is a purpose-to-software compiler. It exists to take bounded intent, turn it into a falsifiable PDD contract, research missing context, decide whether autonomy is allowed, and then run the resulting milestone to completion with clean git history, passing tests, and recorded evidence.
|
|
|
|
Every design decision should be evaluated against this question: **does it make purpose-to-software compilation more reliable, more observable, more recoverable, or more falsifiable?**
|
|
|
|
## User Goals
|
|
|
|
- Hand off a milestone and have it complete without babysitting
|
|
- Know the agent won't make irreversible mistakes (write gates, protected files, budget ceilings)
|
|
- Resume after a crash without losing work (state-on-disk, crash recovery)
|
|
- See what the agent did and why (trace files, decision register, records keeper)
|
|
- Steer mid-run without breaking the loop (message queue, steering gate)
|
|
|
|
## Non-Goals
|
|
|
|
- Being a chat interface — use the Pi interactive mode for exploratory conversation
|
|
- Replacing CI — SF triggers verification but does not replace your existing CI pipeline
|
|
- Working without context — SF needs a spec, a roadmap, and a task plan; it does not invent work from nothing
|
|
|
|
## What Good Product Judgment Looks Like
|
|
|
|
**Fresh context per unit, not accumulated context.** Each task gets a new session with exactly the context it needs pre-injected (task plan, slice plan, prior summaries, relevant skills). This prevents quality degradation from context accumulation — one of the primary failure modes of naive LLM agents on long projects.
|
|
|
|
**State machine, not LLM guessing.** The loop is deterministic: read STATE.md → validate → dispatch → post-unit → verify → advance. The LLM executes work inside a unit; it does not decide what the next unit is. Separating orchestration from execution keeps the system predictable.
|
|
|
|
**Spec-first.** No behavior change without a failing test first. No completion without a real consumer. This is the iron law — not a suggestion. A system that completes tasks without PDD fields and executable evidence is just making things up.
|
|
|
|
**Crash recovery must be invisible.** A crashed session should resume within seconds with no visible data loss. If recovery requires human intervention, it is a product failure.
|
|
|
|
**User stays in the loop via gates, not via interrupts.** Discussion gates, write gates, budget ceilings, and approval prompts are the designed points of human interaction. The agent should not need to ask for help in the middle of a task.
|
|
|
|
## Tradeoffs
|
|
|
|
| Choice | What we gave up | Why |
|
|
|--------|----------------|-----|
|
|
| Fresh session per unit | Conversational continuity across units | Quality and predictability over convenience |
|
|
| State on disk (not in memory) | Speed of in-memory state | Crash recovery and multi-process visibility |
|
|
| Write gate during queue | Faster iteration in planning | Safety: prevents accidental file mutations during discussion |
|
|
| Protected files (ADRs, SPEC.md) | Agent autonomy over architecture docs | Human oversight over durable decisions |
|
|
| Serial execution default | Throughput | Correctness before parallelism; parallel locking is deferred debt |
|