feat: simplify auto pipeline — merge research into planning, mechanical completion (ADR-003) (#1235)
Reduces auto-mode session count from ~30 to ~16 per milestone by: 1. Merging research into planning: plan-milestone and plan-slice prompts now include exploration instructions instead of depending on separate research sessions. All profiles default skip_research/skip_slice_research. 2. Mechanical completion: new mechanical-completion.ts deterministically aggregates task summaries into slice/milestone artifacts (SUMMARY, UAT, VALIDATION, roadmap checkboxes) post-verification, eliminating LLM sessions for complete-slice and validate-milestone. 3. Reassess opt-in: reassess-roadmap now requires explicit reassess_after_slice: true instead of firing by default. 4. Reduced context inlining: plan prompts reference PROJECT/REQUIREMENTS/ DECISIONS by path instead of inlining full content. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
97d589c200
commit
8ac58d88f0
14 changed files with 1676 additions and 37 deletions
738
docs/ADR-003-pipeline-simplification.md
Normal file
738
docs/ADR-003-pipeline-simplification.md
Normal file
|
|
@ -0,0 +1,738 @@
|
|||
# ADR-003: Auto-Mode Pipeline Simplification
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-03-18
|
||||
**Deciders:** Lex Christopherson
|
||||
**Related:** ADR-001 (branchless worktree architecture), ADR-002 (external state directory)
|
||||
**Audited by:** Claude Opus 4.6, OpenAI Codex — findings incorporated below.
|
||||
|
||||
## Context
|
||||
|
||||
GSD auto-mode orchestrates a multi-session pipeline where each "unit" of work runs in a fresh LLM session. The pipeline for a single milestone with N slices and M tasks per slice runs through:
|
||||
|
||||
```
|
||||
research-milestone → plan-milestone →
|
||||
(research-slice → plan-slice → execute-task × M → complete-slice → reassess-roadmap) × N →
|
||||
validate-milestone → complete-milestone
|
||||
```
|
||||
|
||||
The exact session count depends on profile. The "quality" profile runs all phases. The "balanced" profile skips slice research by default. The "budget" profile skips milestone research, slice research, reassessment, and milestone validation. This ADR uses the quality profile as the baseline for analysis — it represents the full pipeline and the worst-case ceremony overhead.
|
||||
|
||||
For a typical 4-slice, 3-task milestone under the quality profile:
|
||||
- 1 research-milestone + 1 plan-milestone
|
||||
- Per slice: research-slice (skipped for S01) + plan-slice + 3 execute-task + complete-slice + reassess-roadmap (skipped for last slice, since all slices are done)
|
||||
- Per-slice total for S01: 0 + 1 + 3 + 1 + 1 = 6
|
||||
- Per-slice total for S02–S04: 1 + 1 + 3 + 1 + 1 = 7 (S04 skips reassess since it's the last completed slice: 6)
|
||||
- Slices total: 6 + 7 + 7 + 6 = 26
|
||||
- Plus: 1 validate-milestone + 1 complete-milestone
|
||||
|
||||
**Total: 30 sessions.** Only 12 are task execution. The remaining 18 are pipeline ceremony.
|
||||
|
||||
(The "balanced" profile drops slice research for S02-S04: 30 - 3 = 27 sessions. The "budget" profile drops milestone research, all slice research, reassessment, and validation: 30 - 1 - 3 - 3 - 1 = 22 sessions.)
|
||||
|
||||
### The Token Tax
|
||||
|
||||
Every fresh session re-ingests static context via prompt inlining. The `auto-prompts.ts` builders (1,099 lines) inline the following files into nearly every unit type:
|
||||
|
||||
| File | Inlined Into | Changes After |
|
||||
|------|-------------|---------------|
|
||||
| ROADMAP | research-slice, plan-slice, execute-task (excerpt), complete-slice, reassess, validate, complete-milestone | plan-milestone (rare reassess rewrites) |
|
||||
| DECISIONS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-milestone, validate | Appended occasionally during execution |
|
||||
| REQUIREMENTS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-slice, complete-milestone, validate | Updated during complete-slice |
|
||||
| KNOWLEDGE.md | research-milestone, plan-milestone, research-slice, plan-slice, execute-task, complete-slice, complete-milestone, validate | Appended occasionally during execution |
|
||||
| PROJECT.md | research-milestone, plan-milestone, complete-milestone, validate | Rarely updated |
|
||||
|
||||
The ROADMAP alone is inlined into 7 unit types. It never changes during normal execution. This is a static document being re-tokenized per session at a cost of 5–20K tokens each time.
|
||||
|
||||
For the 30-session milestone above (quality profile), context re-ingestion costs approximately:
|
||||
- ROADMAP: 7 re-inlines × ~10K tokens = 70K tokens
|
||||
- DECISIONS: 6 re-inlines × ~5K tokens = 30K tokens
|
||||
- REQUIREMENTS: 8 re-inlines × ~5K tokens = 40K tokens
|
||||
- KNOWLEDGE: 8 re-inlines × ~3K tokens = 24K tokens
|
||||
- Templates (research, plan, task-plan, etc.): ~2K per inline × ~10 units = 20K tokens
|
||||
- Dependency summaries: ~8K per slice plan × 3 non-S01 slices = 24K tokens
|
||||
|
||||
**Total context re-ingestion overhead: ~208K tokens per milestone.** This is pure waste — the LLM re-reads documents it already processed in prior sessions, gaining no new information.
|
||||
|
||||
### The Lossy Handoff Problem
|
||||
|
||||
Each session boundary is a lossy compression step. The research-milestone agent reads the codebase and writes a RESEARCH.md. The plan-milestone agent reads that research and produces a ROADMAP. The research-slice agent reads the ROADMAP and explores the codebase again for its slice scope. The plan-slice agent reads that slice research and produces a PLAN.
|
||||
|
||||
This is a game of telephone:
|
||||
|
||||
```
|
||||
Codebase → [researcher reads code] → RESEARCH.md → [planner reads research] → ROADMAP
|
||||
↑ often re-reads the same code
|
||||
```
|
||||
|
||||
The research prompt explicitly says: *"Write for the roadmap planner."* The plan prompt says: *"Trust the research. Don't re-read code."* But planners routinely re-read code because research is a lossy compression — a summary of what one LLM session saw, not the thing itself. The fidelity loss compounds at each handoff.
|
||||
|
||||
### The Machinery Tax
|
||||
|
||||
The multi-session pipeline requires extensive orchestration machinery to handle edge cases, failures, and recovery:
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `auto-recovery.ts` | 591 | Artifact resolution, loop remediation, skip/rerun logic |
|
||||
| `auto-stuck-detection.ts` | 220 | Dispatch loop detection, lifetime caps, stub recovery |
|
||||
| `auto-idempotency.ts` | 150 | Skip completed units, phantom loop detection, stale key recovery |
|
||||
| `session-forensics.ts` | 536 | Post-mortem analysis, crash briefings, deep diagnostics |
|
||||
| `auto-timeout-recovery.ts` | 262 | Resume after timeout, recovery briefing synthesis |
|
||||
| `crash-recovery.ts` | 108 | Lock file management, crash detection |
|
||||
| `auto-post-unit.ts` | 591 | Post-agent processing, verification, commits, state sync |
|
||||
| `auto-verification.ts` | 229 | Post-task verification enforcement |
|
||||
| `verification-gate.ts` | 643 | Test/lint/audit gate runner |
|
||||
| `doctor-proactive.ts` | 292 | Health checks, proactive healing, escalation detection |
|
||||
| **Total** | **3,622** | **Recovery, verification, and post-processing** |
|
||||
|
||||
This is 3,622 lines of code managing the complexity of a 15-rule dispatch table across 13 unit types. Much of this machinery exists because the pipeline has so many sessions that failures, timeouts, and stuck states are statistically likely.
|
||||
|
||||
### The Ceremony Sessions
|
||||
|
||||
Six of the 13 unit types produce no code. They exist purely to manage the pipeline:
|
||||
|
||||
| Unit Type | What It Does | Sessions per Milestone (quality, 4-slice) |
|
||||
|-----------|-------------|----------------------|
|
||||
| research-milestone | Reads codebase, writes RESEARCH.md | 1 |
|
||||
| research-slice | Reads codebase for slice scope, writes slice RESEARCH.md | 3 (skipped for S01) |
|
||||
| complete-slice | Re-reads ROADMAP + plan + all task summaries, writes slice SUMMARY.md + UAT.md | 4 |
|
||||
| reassess-roadmap | Re-reads ROADMAP + slice summary, almost always says "roadmap is fine" | 3 (skipped after last slice) |
|
||||
| validate-milestone | Re-reads ROADMAP + all slice summaries, writes VALIDATION.md | 1 |
|
||||
| complete-milestone | Re-reads ROADMAP + all slice summaries, writes SUMMARY.md | 1 |
|
||||
|
||||
Total: 1 + 3 + 4 + 3 + 1 + 1 = **13 ceremony sessions** (under quality profile), each consuming 12–37K tokens of prompt context. Under the balanced profile this drops to 9 (no slice research). These sessions burn tokens re-reading documents that other sessions already produced, producing intermediate artifacts that downstream sessions then re-read.
|
||||
|
||||
### Root Cause
|
||||
|
||||
The pipeline was designed around a paradigm where:
|
||||
1. LLM context windows are small (32K–100K tokens)
|
||||
2. Sessions are expensive, so specialize each one
|
||||
3. Handoffs between specialized agents produce better results than generalist sessions
|
||||
4. Research → plan → execute is the "correct" decomposition of intellectual work
|
||||
|
||||
With 200K+ token context windows and prompt caching, assumptions 1-2 are obsolete. Assumption 3 is demonstrably false — handoffs lose fidelity. Assumption 4 confuses human workflow patterns with LLM-optimal patterns. An LLM with tool access is already researching while it plans. Forcing it to serialize research into a document, then read that document in a new session, is an artificial bottleneck.
|
||||
|
||||
## Decision
|
||||
|
||||
**Collapse the pipeline from 13 unit types to 5. Merge research into planning. Fold completion into post-unit mechanical processing. Replace LLM-driven validation with mechanical verification aggregation.**
|
||||
|
||||
### The Simplified Pipeline
|
||||
|
||||
```
|
||||
plan-milestone → (plan-slice → execute-task × M) × N → done
|
||||
```
|
||||
|
||||
Note: `discuss` is an interactive human-facing session, not an auto-mode unit — it's not counted in session math. It continues to work as-is.
|
||||
|
||||
For the same 4-slice, 3-task milestone:
|
||||
- 1 plan-milestone (S01 plan + task plans produced inline via single-slice fast path if applicable)
|
||||
- S01: plan-slice skipped (milestone planner already explored) + 3 execute-task = 3
|
||||
- S02–S04: plan-slice + 3 execute-task = 4 each × 3 slices = 12
|
||||
|
||||
**Total: 1 + 3 + 12 = 16 sessions** (down from 30). The 14 eliminated sessions were the highest-waste ones — each re-ingested context for minimal value.
|
||||
|
||||
### Unit Type Changes
|
||||
|
||||
#### 1. Merge research-milestone INTO plan-milestone
|
||||
|
||||
**Current:** Two sessions. Researcher explores codebase, writes RESEARCH.md. Planner reads RESEARCH.md, writes ROADMAP.
|
||||
|
||||
**New:** One session. The plan-milestone agent explores the codebase directly and produces the ROADMAP. It has full tool access — it can read files, run commands, search code. The "research" happens naturally as part of planning, not as a serialized intermediary.
|
||||
|
||||
**What changes:**
|
||||
- The plan-milestone prompt gains the research-milestone's exploration instructions: "Explore relevant code, check technologies, identify constraints."
|
||||
- The plan-milestone prompt drops "Trust the research" — there is no research document to trust.
|
||||
- The RESEARCH.md artifact becomes optional. If the planner wants to capture notes for downstream reference, it can write one. But it's not required, and downstream units don't depend on it.
|
||||
- Skill discovery instructions move into the plan-milestone prompt.
|
||||
- The research-milestone template (`prompts/research-milestone.md`) is retained but only used when explicitly dispatched via `/gsd dispatch research`.
|
||||
|
||||
**Token savings:** ~1 full session (12–37K tokens of prompt context) + the RESEARCH.md document no longer re-inlined into plan-milestone (~5–15K tokens).
|
||||
|
||||
**Quality impact:** Positive. The planner has direct access to the codebase instead of reading a lossy summary. It can verify assumptions in real time instead of trusting a prior session's interpretation.
|
||||
|
||||
#### 2. Merge research-slice INTO plan-slice
|
||||
|
||||
**Current:** Two sessions per non-S01 slice. Slice researcher explores codebase for slice scope, writes slice RESEARCH.md. Slice planner reads that research, writes PLAN.md + task plans.
|
||||
|
||||
**New:** One session. The plan-slice agent explores the relevant code directly and produces the slice plan with task plans.
|
||||
|
||||
**What changes:**
|
||||
- The plan-slice prompt gains exploration instructions: "Read the relevant code for this slice's scope before decomposing."
|
||||
- The plan-slice prompt drops "Trust the research" — there is no slice research document.
|
||||
- Slice RESEARCH.md becomes optional (same as milestone research above).
|
||||
- The research-slice template is retained for explicit dispatch.
|
||||
- The `skip_slice_research` preference becomes the default behavior rather than an opt-in.
|
||||
- The dispatch rule "planning (no research, not S01) → research-slice" is removed.
|
||||
|
||||
**Token savings:** ~1 session per non-S01 slice × (N-1) slices. For a 4-slice milestone: 3 sessions × 12–37K tokens = 36–111K tokens.
|
||||
|
||||
**Quality impact:** Positive. The planner can read actual code files instead of a summary. It verifies file paths, function signatures, and patterns directly rather than trusting a researcher's notes.
|
||||
|
||||
#### 3. Fold complete-slice INTO mechanical post-unit processing
|
||||
|
||||
**Current:** After all tasks in a slice complete, `deriveState()` emits the `summarizing` phase, dispatching a separate complete-slice LLM session that re-reads the ROADMAP, slice plan, and ALL task summaries to write a slice SUMMARY.md and UAT.md.
|
||||
|
||||
**New:** Slice completion moves to a **post-gate mechanical closeout** in `auto-post-unit.ts`, not into the final executor's prompt. After the last execute-task's verification gate passes:
|
||||
|
||||
1. The post-unit processing detects that all tasks in the slice are done (same check `deriveState()` uses to emit `summarizing`).
|
||||
2. It runs mechanical slice completion: aggregate task summaries into a SUMMARY.md using structured frontmatter, generate a UAT.md from the slice plan's verification section, mark the slice done in the ROADMAP.
|
||||
3. If the mechanical summary is insufficient (complex slices where structured aggregation loses important narrative), the system detects low quality (e.g., summary is below a character threshold) and dispatches a standalone complete-slice LLM session as recovery.
|
||||
|
||||
**Why post-gate, not in the executor prompt:**
|
||||
- Codex audit identified that folding completion into execute-task creates a verification-retry ordering problem: if the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure would retry against incorrect derived state (the slice appears complete when it isn't).
|
||||
- Post-gate processing runs after verification succeeds, so state transitions are always consistent.
|
||||
- The executor's context budget is fully available for its actual work.
|
||||
|
||||
**What changes in `deriveState()`:**
|
||||
- The `summarizing` phase still exists in state derivation (all tasks done, slice not marked complete).
|
||||
- The dispatch table no longer maps `summarizing → complete-slice`. Instead, post-unit processing handles the transition synchronously.
|
||||
- If post-unit mechanical completion fails or produces low-quality output, the `summarizing` phase still exists as a dispatch target and the system falls back to dispatching a complete-slice LLM session.
|
||||
|
||||
**What changes:**
|
||||
- `auto-post-unit.ts` gains a `mechanicalSliceCompletion()` function.
|
||||
- The complete-slice dispatch rule is removed from the default path but retained as a fallback.
|
||||
- The complete-slice template is retained for recovery and explicit dispatch.
|
||||
- The `summarizing` phase in `state.ts` is unchanged — it serves as the fallback trigger if mechanical completion doesn't run.
|
||||
|
||||
**Full completion contract preserved:** The mechanical completion writes all three required artifacts (SUMMARY.md, UAT.md, ROADMAP checkbox) — matching the current complete-slice contract. It also handles REQUIREMENTS.md updates and KNOWLEDGE.md/DECISIONS.md appendix that the current complete-slice prompt performs (see Risk 5 below for details).
|
||||
|
||||
**Token savings:** ~1 session per slice × N slices. For a 4-slice milestone: 4 sessions × 12–37K tokens = 48–148K tokens.
|
||||
|
||||
**Quality impact:** For most slices, the mechanical summary is sufficient — it aggregates structured frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) from task summaries. For complex slices with important narrative context, the LLM fallback preserves quality.
|
||||
|
||||
#### 4. Eliminate reassess-roadmap (make opt-in)
|
||||
|
||||
**Current:** After every slice completion, a reassess-roadmap session re-reads the ROADMAP and slice summary, then almost always writes "roadmap is fine."
|
||||
|
||||
**New:** Reassessment is eliminated by default. The plan-slice agent for the next slice serves as the natural reassessment point — it reads the ROADMAP and prior slice summaries, and can adjust its plan if the ground has shifted.
|
||||
|
||||
**What changes:**
|
||||
- The reassess-roadmap dispatch rule fires only when the `reassess_after_slice` preference is enabled (default: off, was effectively always-on).
|
||||
- The plan-slice prompt gains a reassessment preamble: "Before planning this slice, verify that the roadmap's assumptions still hold given prior slice summaries. If the remaining roadmap needs adjustment, modify it before proceeding."
|
||||
- The `checkNeedsReassessment()` function in auto-prompts.ts becomes a preference gate, not a mandatory check.
|
||||
|
||||
**Token savings:** ~1 session per completed non-final slice × (N-1) slices minus those already skipped. For a 4-slice milestone under quality profile: 3 sessions × 12–37K tokens = 36–111K tokens.
|
||||
|
||||
**Quality impact:** Neutral. The reassess prompt says *"Bias strongly toward 'roadmap is fine.'"* — acknowledging that most reassessments produce no change. JIT reassessment during the next plan-slice is more informed (has the next slice's context) and costs zero additional tokens.
|
||||
|
||||
#### 5. Replace validate-milestone with mechanical verification
|
||||
|
||||
**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT-RESULT artifacts from slices with `uat_dispatch` enabled.
|
||||
|
||||
**New:** The system mechanically aggregates verification results from all tasks and slices. The canonical verification data sources are:
|
||||
|
||||
1. **`T##-VERIFY.json`** files (written by `writeVerificationJSON()` in `verification-evidence.ts`) — machine-readable per-task verification results with command, exit code, verdict, duration, and blocking status.
|
||||
2. **`S##-UAT-RESULT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
|
||||
3. **Task summary frontmatter** `verification_result` field — a human-readable pass/fail string (not structured, used as a secondary signal).
|
||||
|
||||
The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT-RESULT artifacts, and produces a deterministic VALIDATION.md.
|
||||
|
||||
**What changes:**
|
||||
- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT-RESULT.md` files across all slices.
|
||||
- The function produces a VALIDATION.md with per-task and per-slice pass/fail status, UAT evidence, and an overall verdict.
|
||||
- The LLM-driven validate-milestone session is removed from the default pipeline.
|
||||
- The validate-milestone template is retained for explicit dispatch (users who want LLM-driven validation can run `/gsd dispatch validate`).
|
||||
- The `skip_milestone_validation` preference (which writes a pass-through VALIDATION.md) becomes the default behavior, with the mechanical aggregation replacing it.
|
||||
|
||||
```typescript
|
||||
async function aggregateMilestoneVerification(base: string, mid: string): Promise<ValidationResult> {
|
||||
const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
|
||||
const checks: EvidenceCheckJSON[] = [];
|
||||
const uatResults: { sliceId: string; content: string }[] = [];
|
||||
|
||||
for (const slice of roadmap.slices) {
|
||||
// Primary source: T##-VERIFY.json files (machine-readable, written by verification-gate.ts)
|
||||
const tDir = resolveTasksDir(base, mid, slice.id);
|
||||
if (tDir) {
|
||||
const verifyFiles = resolveTaskFiles(tDir, "VERIFY");
|
||||
for (const file of verifyFiles) {
|
||||
const content = await loadFile(join(tDir, file));
|
||||
if (content) {
|
||||
const evidence: EvidenceJSON = JSON.parse(content);
|
||||
checks.push(...evidence.checks);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Secondary source: S##-UAT-RESULT.md (when uat_dispatch enabled)
|
||||
const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT-RESULT");
|
||||
if (uatResultFile) {
|
||||
const uatContent = await loadFile(uatResultFile);
|
||||
if (uatContent) uatResults.push({ sliceId: slice.id, content: uatContent });
|
||||
}
|
||||
}
|
||||
|
||||
const allChecksPassed = checks.every(c => c.verdict === "pass");
|
||||
const hasUatFailures = uatResults.some(r => r.content.includes("❌") || r.content.includes("FAIL"));
|
||||
const verdict = allChecksPassed && !hasUatFailures ? "pass" : "needs-attention";
|
||||
|
||||
return { verdict, checks, uatResults };
|
||||
}
|
||||
```
|
||||
|
||||
**Token savings:** 1 session × 12–37K tokens. This session is one of the most context-heavy — it inlines the ROADMAP + all slice summaries + all UAT results.
|
||||
|
||||
**Quality impact:** Positive. Mechanical verification is deterministic and complete. LLM validation is subjective and can miss things. The verification gate and UAT system already do the hard work — the validate session was a redundant re-check. The `T##-VERIFY.json` artifacts are the canonical machine-readable source, not task summary frontmatter.
|
||||
|
||||
#### 6. Replace complete-milestone with mechanical completion
|
||||
|
||||
**Current:** An LLM session re-reads the ROADMAP and all slice summaries to write a SUMMARY.md.
|
||||
|
||||
**New:** The system produces a milestone summary mechanically by aggregating slice summaries. The summary includes: milestone title, success criteria with pass/fail status, slice completion dates, key decisions made, and patterns established (all extracted from structured frontmatter in slice summaries).
|
||||
|
||||
**What changes:**
|
||||
- A new `generateMilestoneSummary()` function reads all slice SUMMARY.md files, extracts frontmatter fields, and produces a structured milestone SUMMARY.md.
|
||||
- The complete-milestone dispatch rule is replaced with a synchronous post-processing step after the validation artifact is written.
|
||||
- The complete-milestone template is retained for explicit dispatch.
|
||||
|
||||
**What changes in `deriveState()`:**
|
||||
- The `validating-milestone` and `completing-milestone` phases still exist in state derivation.
|
||||
- When mechanical validation + completion runs synchronously in post-unit processing, these phases are transient — `deriveState()` emits them, but the mechanical processing writes the VALIDATION.md and SUMMARY.md artifacts before the next dispatch cycle, so the phases resolve immediately.
|
||||
- If mechanical processing fails, the phases remain as dispatch targets and the system falls back to dispatching LLM sessions for validation and/or completion.
|
||||
|
||||
**Token savings:** 1 session × 12–37K tokens.
|
||||
|
||||
**Quality impact:** Neutral. Milestone summaries are archival — they capture what happened, not make decisions. Mechanical aggregation of structured frontmatter is more reliable than an LLM re-interpreting task summaries.
|
||||
|
||||
### Dispatch Table Changes
|
||||
|
||||
**Current: 15 rules.**
|
||||
|
||||
```
|
||||
1. rewrite-docs (override gate)
|
||||
2. summarizing → complete-slice
|
||||
3. run-uat (post-completion)
|
||||
4. reassess-roadmap (post-completion)
|
||||
5. needs-discussion → stop
|
||||
6. pre-planning (no context) → stop
|
||||
7. pre-planning (no research) → research-milestone
|
||||
8. pre-planning (has research) → plan-milestone
|
||||
9. planning (no research, not S01) → research-slice
|
||||
10. planning → plan-slice
|
||||
11. replanning-slice → replan-slice
|
||||
12. executing → execute-task (recovery)
|
||||
13. executing → execute-task
|
||||
14. validating-milestone → validate-milestone
|
||||
15. completing-milestone → complete-milestone
|
||||
```
|
||||
|
||||
**New: 11 rules.**
|
||||
|
||||
```
|
||||
1. rewrite-docs (override gate) [unchanged]
|
||||
2. summarizing → complete-slice [FALLBACK ONLY — fires when mechanical completion didn't run]
|
||||
3. run-uat (post-completion) [unchanged, preference-gated]
|
||||
4. needs-discussion → stop [unchanged]
|
||||
5. pre-planning (no context) → stop [unchanged]
|
||||
6. pre-planning → plan-milestone [rules 7+8 merged — research folded in]
|
||||
7. planning → plan-slice [rules 9+10 merged — research folded in]
|
||||
8. replanning-slice → replan-slice [unchanged]
|
||||
9. executing → execute-task (recovery) [unchanged]
|
||||
10. executing → execute-task [unchanged]
|
||||
11. validating-milestone → validate-milestone [FALLBACK ONLY — fires when mechanical validation didn't run]
|
||||
12. completing-milestone → complete-milestone [FALLBACK ONLY — fires when mechanical completion didn't run]
|
||||
```
|
||||
|
||||
Note: Rules 2, 11, and 12 are retained as **fallbacks** for cases where mechanical processing fails. They do not fire in the normal path because post-unit processing writes the required artifacts before the next dispatch cycle. This means `deriveState()` is unchanged — it still emits `summarizing`, `validating-milestone`, and `completing-milestone` phases. The change is that these phases are normally resolved mechanically before dispatch evaluates them.
|
||||
|
||||
**Removed rules (no longer in default path):**
|
||||
- `reassess-roadmap` — folded into next plan-slice (or opt-in preference)
|
||||
- `pre-planning (no research) → research-milestone` — merged into plan-milestone
|
||||
- `planning (no research, not S01) → research-slice` — merged into plan-slice
|
||||
|
||||
### Prompt Changes
|
||||
|
||||
#### plan-milestone.md — gains exploration instructions
|
||||
|
||||
Add before the planning steps:
|
||||
|
||||
```markdown
|
||||
## Explore First, Then Decompose
|
||||
|
||||
You have full tool access. Before decomposing into slices:
|
||||
1. Explore the relevant codebase — read key files, understand existing patterns, identify constraints.
|
||||
2. For unfamiliar libraries, use `resolve_library` / `get_library_docs`.
|
||||
3. Skill Discovery ({{skillDiscoveryMode}}):{{skillDiscoveryInstructions}}
|
||||
|
||||
Narrate key findings as you go. If findings are significant enough to benefit downstream slice planners, write {{researchOutputPath}} — but only if the content would genuinely help. Don't write a research doc just because the template exists.
|
||||
```
|
||||
|
||||
#### plan-slice.md — gains exploration + reassessment preamble
|
||||
|
||||
Add before the planning steps:
|
||||
|
||||
```markdown
|
||||
## Verify Roadmap Assumptions
|
||||
|
||||
Before planning this slice, check whether the roadmap's assumptions still hold:
|
||||
- Review prior slice summaries (inlined above). Did anything change that affects this slice?
|
||||
- If the remaining roadmap needs adjustment, modify the unchecked slices in {{roadmapPath}} before proceeding.
|
||||
|
||||
## Explore Slice Scope
|
||||
|
||||
Read the relevant code for this slice before decomposing:
|
||||
1. Check the files and modules this slice will touch.
|
||||
2. Verify the approach described in the roadmap against the actual codebase state.
|
||||
3. If the roadmap's description of this slice is wrong or outdated, adjust your plan accordingly.
|
||||
```
|
||||
|
||||
### Context Inlining Changes
|
||||
|
||||
#### Reduce inlining for planning sessions — provide paths for stable documents
|
||||
|
||||
Planning sessions (plan-milestone, plan-slice) currently inline ROADMAP, DECISIONS, REQUIREMENTS, KNOWLEDGE, and PROJECT. Since these sessions now also explore the codebase (merged research), the total prompt size grows. To offset this, stable documents should be provided as file paths rather than inlined content for planning sessions.
|
||||
|
||||
**Current pattern:**
|
||||
```typescript
|
||||
inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
|
||||
```
|
||||
|
||||
**New pattern for plan-milestone/plan-slice:**
|
||||
```typescript
|
||||
sourcePaths.push(`- Milestone Roadmap: \`${roadmapRel}\` — read this for the full slice decomposition`);
|
||||
```
|
||||
|
||||
The prompt header changes from "All relevant context has been preloaded below" to "Source files are listed below. Read them before proceeding."
|
||||
|
||||
**What stays inlined:**
|
||||
- **Task plan** in execute-task (it's the executor's authoritative contract — must be in prompt)
|
||||
- **Slice plan excerpt** in execute-task (goal/demo/verification — small and task-specific)
|
||||
- **Prior task summaries** in execute-task (carry-forward context — already budget-managed)
|
||||
- **Milestone context** in plan-milestone (it's the starting input — relatively small)
|
||||
|
||||
**What moves to file-path references:**
|
||||
- ROADMAP in plan-slice, complete-slice, reassess, validate, complete-milestone
|
||||
- DECISIONS.md everywhere except execute-task (where it's already omitted for minimal inline level)
|
||||
- REQUIREMENTS.md everywhere except execute-task
|
||||
- KNOWLEDGE.md everywhere (already uses `inlineFileSmart` for execute-task)
|
||||
- PROJECT.md everywhere
|
||||
|
||||
**Interaction with budget engine:** The current budget engine (`context-budget.ts`) truncates inlined content when it exceeds budget. Removing inlining means the LLM reads the full file via tool call. For most documents (ROADMAP ~3-10K chars, DECISIONS ~2-5K chars), the full read is within budget. For very large REQUIREMENTS.md files (>30K chars), the LLM may need to use the DB-scoped query (`inlineRequirementsFromDb` with slice scoping) or the compact formatter. The path reference should note: "For large files, use scoped queries."
|
||||
|
||||
**Risk: LLMs might not read referenced files.**
|
||||
|
||||
This is the most significant behavioral risk in this ADR. Inlined content forces processing. Path references require the LLM to decide to read. Mitigation:
|
||||
|
||||
1. **Mandatory read directives.** The prompt says "You MUST read the following files before proceeding" with a numbered list of 2-3 critical files. Not "read as needed" — a direct instruction.
|
||||
2. **Verification.** The plan-slice prompt requires citing the ROADMAP's slice description in its output (slice title, risk level, depends). If these don't match, the planner didn't read it.
|
||||
3. **Phased rollout.** Phase 4 (context reduction) is separate from Phase 1 (research merge). This allows measuring whether path references degrade plan quality before full rollout.
|
||||
4. **Fallback.** If path references prove unreliable, restore inlining for critical documents only (ROADMAP in plan-slice). The budget engine still handles truncation.
|
||||
|
||||
**Token savings (Phase 4 only):** Eliminates ~150K tokens of re-ingestion per milestone (revised from 208K — the execute-task sessions retain inlined content). The LLM reads files as needed via tool calls, cached by API prompt caching. Net savings are ~50-60% of the re-ingestion overhead, since the LLM still reads most files once per session.
|
||||
|
||||
### Post-Unit Processing Changes
|
||||
|
||||
#### Mechanical slice completion
|
||||
|
||||
After the last execute-task's verification gate passes and post-unit processing detects all tasks done:
|
||||
|
||||
```typescript
|
||||
async function mechanicalSliceCompletion(base: string, mid: string, sid: string): Promise<boolean> {
|
||||
const tDir = resolveTasksDir(base, mid, sid);
|
||||
if (!tDir) return false;
|
||||
|
||||
const summaryFiles = resolveTaskFiles(tDir, "SUMMARY").sort();
|
||||
const taskSummaries = await Promise.all(
|
||||
summaryFiles.map(async f => ({ file: f, summary: parseSummary(await loadFile(join(tDir, f)) ?? "") }))
|
||||
);
|
||||
|
||||
// Aggregate structured frontmatter
|
||||
const allProvides = taskSummaries.flatMap(t => t.summary.frontmatter.provides);
|
||||
const allKeyFiles = taskSummaries.flatMap(t => t.summary.frontmatter.key_files);
|
||||
const allDecisions = taskSummaries.flatMap(t => t.summary.frontmatter.key_decisions);
|
||||
const allPatterns = taskSummaries.flatMap(t => t.summary.frontmatter.patterns_established);
|
||||
const allAffects = taskSummaries.flatMap(t => t.summary.frontmatter.affects);
|
||||
|
||||
// Build slice SUMMARY.md from aggregated frontmatter
|
||||
const sliceSummary = formatSliceSummary({ sid, provides: allProvides, keyFiles: allKeyFiles, ... });
|
||||
|
||||
// Build UAT.md from slice plan's Verification section
|
||||
const slicePlanContent = await loadFile(resolveSliceFile(base, mid, sid, "PLAN"));
|
||||
const verificationSection = extractMarkdownSection(slicePlanContent, "Verification");
|
||||
const sliceUat = formatSliceUat(sid, verificationSection);
|
||||
|
||||
// Write all three artifacts atomically
|
||||
writeFileSync(sliceSummaryPath, sliceSummary);
|
||||
writeFileSync(sliceUatPath, sliceUat);
|
||||
markSliceDoneInRoadmap(base, mid, sid);
|
||||
|
||||
// Handle REQUIREMENTS.md updates (currently done by complete-slice prompt step 5)
|
||||
// Mechanical: mark requirements as Validated if all tasks covering them passed verification.
|
||||
await mechanicalRequirementsUpdate(base, mid, sid, taskSummaries);
|
||||
|
||||
// Handle DECISIONS.md appendix (currently done by complete-slice prompt step 8)
|
||||
// Mechanical: collect key_decisions from task summaries not already in DECISIONS.md
|
||||
await appendNewDecisions(base, taskSummaries);
|
||||
|
||||
// Handle KNOWLEDGE.md appendix (currently done by complete-slice prompt step 9)
|
||||
// Not mechanical — skip. Knowledge entries require judgment about what's genuinely useful.
|
||||
// The executor tasks already write KNOWLEDGE.md entries during execution (step 13 in execute-task).
|
||||
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
**Fallback:** If `mechanicalSliceCompletion()` fails or produces output below a quality threshold (e.g., summary under 200 chars for a multi-task slice), the `summarizing` phase persists in `deriveState()` and the dispatch table's retained fallback rule dispatches a complete-slice LLM session.
|
||||
|
||||
#### Mechanical milestone validation
|
||||
|
||||
See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT-RESULT.md` as canonical sources.
|
||||
|
||||
#### Mechanical milestone summary
|
||||
|
||||
```typescript
|
||||
async function generateMilestoneSummary(base: string, mid: string): Promise<string> {
|
||||
const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
|
||||
const sliceSummaries = [];
|
||||
|
||||
for (const slice of roadmap.slices) {
|
||||
const content = await loadFile(resolveSliceFile(base, mid, slice.id, "SUMMARY"));
|
||||
if (content) sliceSummaries.push({ id: slice.id, summary: parseSummary(content) });
|
||||
}
|
||||
|
||||
// Aggregate frontmatter fields across all slice summaries
|
||||
// Produce structured markdown from the aggregation
|
||||
return formatMilestoneSummary(roadmap, sliceSummaries);
|
||||
}
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Session Count Reduction
|
||||
|
||||
Counts assume no fallback sessions fire (mechanical processing succeeds). "Current" uses quality profile. "New" is the simplified pipeline.
|
||||
|
||||
| Milestone Shape | Current Sessions (quality) | New Sessions | Reduction |
|
||||
|----------------|---------------------------|--------------|-----------|
|
||||
| 1 slice, 2 tasks | 9 | 3 | 67% |
|
||||
| 2 slices, 3 tasks | 17 | 8 | 53% |
|
||||
| 4 slices, 3 tasks | 30 | 16 | 47% |
|
||||
| 6 slices, 4 tasks | 46 | 31 | 33% |
|
||||
|
||||
**Derivation (4-slice, 3-task):**
|
||||
|
||||
Current (quality): research-milestone(1) + plan-milestone(1) + [research-slice(0) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] for S01 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] × 2 for S02-S03 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(0)] for S04 + validate(1) + complete-milestone(1) = 2 + 6 + 14 + 6 + 2 = 30.
|
||||
|
||||
New: plan-milestone(1) + [execute(3)] for S01 + [plan-slice(1) + execute(3)] × 3 for S02-S04 = 1 + 3 + 12 = 16.
|
||||
|
||||
### Token Savings
|
||||
|
||||
Eliminated sessions are the primary savings mechanism. Context re-ingestion reduction is a secondary effect of having fewer sessions (each of the remaining sessions still ingests some context). These are NOT additive — the re-ingestion savings are already captured in the eliminated session savings.
|
||||
|
||||
| Source | Per Milestone (4-slice, 3-task) |
|
||||
|--------|-------------------------------|
|
||||
| Eliminated research sessions (1 milestone + 3 slice) | 48–148K tokens |
|
||||
| Eliminated complete-slice sessions (4) | 48–148K tokens |
|
||||
| Eliminated reassess sessions (3) | 36–111K tokens |
|
||||
| Eliminated validate session (1) | 12–37K tokens |
|
||||
| Eliminated complete-milestone session (1) | 12–37K tokens |
|
||||
| **Total estimated savings** | **~156–481K tokens** |
|
||||
|
||||
At current Opus pricing ($15/MTok input, $75/MTok output — as of March 2026), the input savings alone are **$2.34–$7.22 per milestone**. Output savings are harder to estimate but typically 30-50% of input.
|
||||
|
||||
### Code Deletion
|
||||
|
||||
| File / Section | Lines | Impact |
|
||||
|----------------|-------|--------|
|
||||
| `auto-dispatch.ts` — 3 removed default-path rules | ~40 | Simpler dispatch table |
|
||||
| `auto-prompts.ts` — 5 builders become fallback-only | ~250 | `buildResearchMilestonePrompt`, `buildResearchSlicePrompt`, `buildCompleteSlicePrompt`, `buildValidateMilestonePrompt`, `buildCompleteMilestonePrompt` move to explicit-dispatch codepath |
|
||||
| `auto-prompts.ts` — reduced inlining (Phase 4) | ~100 | Remove `inlineFile` calls for static docs in planning prompts, replace with path references |
|
||||
| Context re-ingestion helpers (Phase 4) | ~50 | `inlineDecisionsFromDb`, `inlineRequirementsFromDb`, `inlineProjectFromDb` simplified for planning paths |
|
||||
| **Total deletable** | **~440** | |
|
||||
|
||||
### Code Added
|
||||
|
||||
| File / Section | Lines | Impact |
|
||||
|----------------|-------|--------|
|
||||
| `auto-prompts.ts` — plan-milestone exploration | ~30 | Research instructions merged in |
|
||||
| `auto-prompts.ts` — plan-slice reassessment + exploration | ~25 | Reassessment + exploration preamble |
|
||||
| `auto-post-unit.ts` — `mechanicalSliceCompletion()` | ~80 | Structured frontmatter aggregation, UAT generation, artifact writes |
|
||||
| `auto-verification.ts` — `aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT-RESULT aggregation |
|
||||
| `auto-unit-closeout.ts` — `generateMilestoneSummary()` | ~60 | Mechanical summary generation |
|
||||
| **Total added** | **~255** | |
|
||||
|
||||
### Net Impact
|
||||
|
||||
- **~185 lines net deleted** (440 deleted - 255 added)
|
||||
- **3 fewer default-path dispatch rules** (15 → 12, with 3 retained as fallbacks)
|
||||
- **6 fewer unit types in the default pipeline** (13 → 7 active; 6 retained for fallback/explicit dispatch)
|
||||
- **~156–481K fewer tokens per milestone**
|
||||
- **14 fewer session handoffs per 4-slice milestone under quality profile** (each a potential failure/timeout point)
|
||||
- `auto-prompts.ts` goes from ~1,099 lines to ~924 lines (~175 lines net reduction)
|
||||
|
||||
### What Stays Unchanged
|
||||
|
||||
- The **discuss** flow (guided-flow.ts, interactive discussion)
|
||||
- The **dispatch table architecture** (declarative rules, first-match-wins)
|
||||
- The **fresh session per unit** pattern (still used for plan-slice and execute-task)
|
||||
- The **state derivation** (`deriveState()` reads files, derives phase — all existing phases preserved)
|
||||
- The **verification gate** (runs tests/lint after each task)
|
||||
- The **worktree isolation** model
|
||||
- The **crash recovery**, **idempotency**, and **stuck detection** systems (fewer sessions means these fire less often, but the safety nets remain)
|
||||
- The **metrics** and **cost tracking** systems
|
||||
- The **parallel orchestrator** for independent milestones
|
||||
- All prompt templates are **retained** — for fallback, recovery, and explicit dispatch via `/gsd dispatch <unit-type>`
|
||||
|
||||
### What Gets Simpler Downstream
|
||||
|
||||
Less machinery is needed when sessions are fewer:
|
||||
|
||||
- **Fewer recovery paths.** 14 fewer sessions means 14 fewer opportunities for timeouts, stuck states, and missing artifacts.
|
||||
- **Simpler `auto-post-unit.ts`.** Reassess dispatch logic removed (opt-in only). Mechanical completion/validation added but replaces more complex LLM-session dispatch.
|
||||
- **Simpler `auto-stuck-detection.ts`.** Fewer unit types means fewer dispatch-loop patterns to detect.
|
||||
- **Simpler `auto-idempotency.ts`.** Fewer completed-key types to track.
|
||||
|
||||
These simplifications are downstream effects — they don't need to happen in the same change. But they represent ~500-1000 lines of code that becomes significantly simpler or unnecessary as a consequence of this ADR.
|
||||
|
||||
## Risks
|
||||
|
||||
### 1. Plan-milestone sessions become heavier
|
||||
|
||||
Merging research into planning makes plan-milestone sessions longer. The planner must explore the codebase AND decompose into slices in a single session. Risk: the session hits context pressure before finishing.
|
||||
|
||||
**Mitigation:** Plan-milestone is the session that benefits most from a large context window. Modern context windows (200K+ tokens) easily accommodate exploration + planning. The single-slice fast path (already in plan-milestone.md) already combines planning with slice plan + task plan writing in one session — this extends that pattern. Phase 4 (reducing inlining for planning sessions) further offsets the added exploration work.
|
||||
|
||||
**Phase ordering note:** Phase 1 (merge research into planning) adds exploration to plan-milestone. If Phase 4 (reduce inlining) hasn't landed yet, the plan-milestone prompt includes both exploration instructions AND the full inlined context. This is the most context-heavy state. To mitigate, Phase 1 should also reduce inlining for plan-milestone/plan-slice specifically — moving DECISIONS, REQUIREMENTS, and PROJECT to path references while keeping ROADMAP and CONTEXT inlined. This is a targeted subset of Phase 4, not a separate phase.
|
||||
|
||||
### 2. Mechanical completion quality
|
||||
|
||||
The mechanical slice completion aggregates structured frontmatter but cannot produce narrative context, forward intelligence sections, or nuanced UAT scenarios that the current LLM-driven complete-slice session produces.
|
||||
|
||||
**Mitigation:**
|
||||
- For most slices (2-3 tasks, straightforward work), structured aggregation is sufficient. The frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) capture the essential information.
|
||||
- The quality threshold fallback dispatches a complete-slice LLM session for complex slices.
|
||||
- The LLM fallback is zero-cost to implement — the complete-slice template and dispatch rule are retained.
|
||||
|
||||
### 3. Loss of research artifacts
|
||||
|
||||
RESEARCH.md files provided a useful paper trail for debugging plan quality. Without them, it's harder to understand why a planner made certain decisions.
|
||||
|
||||
**Mitigation:**
|
||||
- The planner's narration (visible in the conversation transcript) captures exploration reasoning.
|
||||
- RESEARCH.md is optional, not eliminated. Planners can write one when exploration is complex.
|
||||
- The KNOWLEDGE.md file captures non-obvious patterns and decisions.
|
||||
- DECISIONS.md captures structural choices.
|
||||
|
||||
### 4. Reassessment gaps
|
||||
|
||||
Without mandatory reassessment, a slice might complete with findings that invalidate the remaining roadmap, and the next planner might not notice.
|
||||
|
||||
**Mitigation:**
|
||||
- The plan-slice prompt includes a reassessment preamble that explicitly checks prior slice summaries.
|
||||
- The `blocker_discovered` flag in task summaries already triggers automatic replanning.
|
||||
- Users who want explicit reassessment can enable the `reassess_after_slice` preference.
|
||||
|
||||
### 5. Mechanical completion doesn't cover all complete-slice responsibilities
|
||||
|
||||
The current complete-slice prompt (steps 5, 8, 9) updates REQUIREMENTS.md, appends to DECISIONS.md, and appends to KNOWLEDGE.md. The mechanical completion handles REQUIREMENTS.md and DECISIONS.md mechanically but cannot produce KNOWLEDGE.md entries (which require judgment about what's genuinely useful).
|
||||
|
||||
**Mitigation:**
|
||||
- Execute-task prompt step 13 already instructs executors to append to KNOWLEDGE.md during task execution. Most knowledge entries are discovered during implementation, not during completion.
|
||||
- DECISIONS.md appendix is handled mechanically by collecting `key_decisions` from task summaries and deduplicating against existing entries.
|
||||
- REQUIREMENTS.md updates are handled mechanically by cross-referencing task verification results against requirement-to-slice mappings.
|
||||
- For the LLM fallback path (complex slices), the complete-slice prompt retains all responsibilities.
|
||||
|
||||
### 6. Migration path
|
||||
|
||||
Milestones in progress when this change deploys will have state files (RESEARCH.md, etc.) that the new pipeline doesn't produce. The dispatch table must gracefully handle both old-style and new-style state.
|
||||
|
||||
**Mitigation:**
|
||||
- Dispatch rules check for file existence, not file absence. A milestone with an existing RESEARCH.md still works — the plan-milestone rule fires regardless of whether research exists.
|
||||
- The idempotency system already handles "completed research unit → dispatch plan" transitions.
|
||||
- All `deriveState()` phases are preserved — old-style state resolves correctly.
|
||||
- No migration needed. The new pipeline is strictly more permissive than the old one.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### A. Keep research as a separate session, just make it optional
|
||||
|
||||
Add a `skip_research` preference (already exists) and make it default to true. This is the minimal change — one boolean flip.
|
||||
|
||||
**Rejected:** This saves sessions but doesn't address the context re-ingestion problem, the lossy handoff problem, or the ceremony session overhead. It's a preference toggle, not an architectural improvement.
|
||||
|
||||
### B. Keep all unit types but share context via a persistent cache
|
||||
|
||||
Instead of fresh sessions, maintain a shared context store that persists across units. Each unit reads from the store instead of re-inlining files.
|
||||
|
||||
**Rejected:** This requires a fundamentally different session model — either a long-running session (which hits context limits) or a cache mechanism that the LLM can query (which doesn't exist in the Claude API). The fresh-session-per-unit model is correct; the problem is what we put in each session, not the session model itself.
|
||||
|
||||
### C. Collapse everything into a single session per slice
|
||||
|
||||
One session per slice: plan + execute all tasks + complete. Maximum context efficiency.
|
||||
|
||||
**Rejected:** This hits real context limits for slices with 4+ tasks. Task execution is legitimately heavy — reading code, writing code, running tests, debugging failures. A single session for all of this would exhaust the context window. The plan-slice / execute-task boundary is a genuine engineering constraint, not ceremony.
|
||||
|
||||
### D. Fold completion into the last executor's prompt instead of post-unit processing
|
||||
|
||||
The original design had the last execute-task writing SUMMARY.md, UAT.md, and marking the slice done.
|
||||
|
||||
**Rejected (per Codex audit):** This creates a verification-retry ordering problem. If the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure retries against incorrect derived state. Post-gate mechanical processing avoids this by running only after verification succeeds.
|
||||
|
||||
### E. Keep complete-slice as a separate session
|
||||
|
||||
The mechanical summary quality might be insufficient for complex slices.
|
||||
|
||||
**Addressed:** The mechanical approach with LLM fallback provides the best of both worlds. Simple slices get fast mechanical completion. Complex slices fall back to the existing LLM session. The quality threshold is tunable.
|
||||
|
||||
## Action Items
|
||||
|
||||
### Phase 1: Merge research into planning (+ targeted inlining reduction)
|
||||
1. Update `buildPlanMilestonePrompt()` — add exploration instructions, skill discovery, drop "Trust the research"
|
||||
2. Update `buildPlanSlicePrompt()` — add exploration instructions, reassessment preamble, drop "Trust the research"
|
||||
3. Remove dispatch rule "pre-planning (no research) → research-milestone" — merge with "pre-planning (has research) → plan-milestone" into single "pre-planning → plan-milestone"
|
||||
4. Remove dispatch rule "planning (no research, not S01) → research-slice"
|
||||
5. Update `plan-milestone.md` and `plan-slice.md` prompt templates
|
||||
6. Make `skip_research` and `skip_slice_research` preferences default to true (backwards compat)
|
||||
7. Retain research templates for explicit `/gsd dispatch research` use
|
||||
8. **Targeted inlining reduction for planning sessions:** Move DECISIONS, REQUIREMENTS, PROJECT to path references in plan-milestone and plan-slice prompts. Keep ROADMAP and CONTEXT inlined. This prevents context pressure from the added exploration work.
|
||||
|
||||
### Phase 2: Mechanical slice completion
|
||||
9. Implement `mechanicalSliceCompletion()` in `auto-post-unit.ts`
|
||||
10. Wire into post-unit processing: detect all-tasks-done after verification gate passes, run mechanical completion
|
||||
11. Implement quality threshold check (summary length, artifact presence)
|
||||
12. Retain `summarizing → complete-slice` dispatch rule as fallback for mechanical failures
|
||||
13. Implement `mechanicalRequirementsUpdate()` and `appendNewDecisions()`
|
||||
|
||||
### Phase 3: Mechanical milestone validation + completion
|
||||
14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT-RESULT.md`
|
||||
15. Implement `generateMilestoneSummary()` from slice summary aggregation
|
||||
16. Wire into post-unit processing: after last slice completion, run mechanical validation + summary
|
||||
17. Make reassess-roadmap opt-in via `reassess_after_slice` preference (default: false)
|
||||
18. Retain `validating-milestone` and `completing-milestone` dispatch rules as fallbacks
|
||||
|
||||
### Phase 4: Full context re-ingestion reduction
|
||||
19. Replace remaining `inlineFile()` calls for stable documents with mandatory-read path references
|
||||
20. Update prompt headers with explicit "You MUST read" directives for critical files
|
||||
21. Add plan output verification (must cite ROADMAP slice description)
|
||||
22. Measure plan quality metrics before/after to validate the change
|
||||
|
||||
### Phase 5: Downstream simplification (optional, deferred)
|
||||
23. Simplify `auto-post-unit.ts` — remove reassess dispatch logic (opt-in only)
|
||||
24. Simplify `auto-stuck-detection.ts` — fewer unit type patterns
|
||||
25. Simplify `auto-idempotency.ts` — fewer completed-key types
|
||||
26. Review `auto-recovery.ts` — simplify recovery paths for unit types that are now fallback-only
|
||||
27. Update auto-mode documentation (`docs/auto-mode.md`)
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Round 1 — Three-model review (March 18, 2026)
|
||||
|
||||
**Claude Opus 4.6** identified 8 issues:
|
||||
1. ✅ Session count math inconsistent about S01 plan-slice skip — **fixed**: explicit derivation added with per-slice breakdown
|
||||
2. ✅ `discuss` session counted in pipeline but not in math — **fixed**: noted as interactive session, not auto-mode unit
|
||||
3. ✅ Token savings double-counting (eliminated sessions + re-ingestion) — **fixed**: removed overlap, noted savings are not additive
|
||||
4. ✅ Context inlining change (file paths vs inline) underanalyzed — **fixed**: expanded to dedicated risk section with enforcement strategy, phased rollout, and interaction with budget engine
|
||||
5. ✅ Budget engine interaction not discussed — **fixed**: addressed in context inlining section
|
||||
6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT-RESULT.md`
|
||||
7. ✅ Phase ordering creates heavy intermediate state (Phase 1 without Phase 4) — **fixed**: Phase 1 now includes targeted inlining reduction for planning sessions
|
||||
8. ✅ ADR number conflict — **fixed**: confirmed no ADR-003 exists in `docs/` (the referenced file doesn't exist in current git)
|
||||
|
||||
**OpenAI Codex** identified 6 issues:
|
||||
1. ✅ HIGH: Folding completion into execute-task breaks verification-retry model — **fixed**: moved completion to post-gate mechanical processing instead of executor prompt. Added Alternative D explaining why.
|
||||
2. ✅ HIGH: Mechanical validation reads nonexistent `verification_evidence` frontmatter — **fixed**: now reads `T##-VERIFY.json` (canonical machine-readable source from `verification-evidence.ts`)
|
||||
3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT-RESULT.md`
|
||||
4. ✅ HIGH: "State derivation stays unchanged" is false — **fixed**: explicitly documented that `deriveState()` phases are preserved, mechanical processing resolves them synchronously, fallback dispatch rules handle failures
|
||||
5. ✅ MEDIUM: Folded completion omits REQUIREMENTS.md and KNOWLEDGE.md updates — **fixed**: mechanical completion handles REQUIREMENTS.md and DECISIONS.md; KNOWLEDGE.md addressed in Risk 5
|
||||
6. ✅ MEDIUM: Session and token math inconsistent — **fixed**: complete rederivation with per-slice breakdown, corrected to 30 baseline sessions, noted profile variations
|
||||
|
||||
**Gemini 2.5 Pro** audit was not usable — it hallucinated the ADR as a CI/CD pipeline document about GitHub Actions, matrix builds, and nx workspace tooling. No findings were applicable to the actual content.
|
||||
|
|
@ -126,8 +126,8 @@ const DISPATCH_RULES: DispatchRule[] = [
|
|||
{
|
||||
name: "reassess-roadmap (post-completion)",
|
||||
match: async ({ state, mid, midTitle, basePath, prefs }) => {
|
||||
// Phase skip: skip reassess when preference or profile says so
|
||||
if (prefs?.phases?.skip_reassess) return null;
|
||||
// Reassess is opt-in: only fire when explicitly enabled
|
||||
if (!prefs?.phases?.reassess_after_slice) return null;
|
||||
const needsReassess = await checkNeedsReassessment(basePath, mid, state);
|
||||
if (!needsReassess) return null;
|
||||
return {
|
||||
|
|
|
|||
|
|
@ -331,6 +331,45 @@ export async function postUnitPostVerification(pctx: PostUnitContext): Promise<"
|
|||
}
|
||||
}
|
||||
|
||||
// ── Mechanical completion (ADR-003) ──
|
||||
// After task execution, attempt mechanical slice and milestone completion
|
||||
// instead of dispatching LLM sessions for complete-slice / validate-milestone.
|
||||
if (s.currentUnit?.type === "execute-task" && !s.stepMode) {
|
||||
try {
|
||||
const [mid, sid] = s.currentUnit.id.split("/");
|
||||
if (mid && sid) {
|
||||
const state = await deriveState(s.basePath);
|
||||
if (state.phase === "summarizing" && state.activeSlice?.id === sid) {
|
||||
const { mechanicalSliceCompletion } = await import("./mechanical-completion.js");
|
||||
const ok = await mechanicalSliceCompletion(s.basePath, mid, sid);
|
||||
if (ok) {
|
||||
invalidateAllCaches();
|
||||
autoCommitCurrentBranch(s.basePath, "mechanical-completion", `${mid}/${sid}`);
|
||||
ctx.ui.notify(`Mechanical completion: ${sid} summary + roadmap updated.`, "info");
|
||||
|
||||
// Re-derive state — check if milestone is now ready for validation
|
||||
invalidateAllCaches();
|
||||
const postSliceState = await deriveState(s.basePath);
|
||||
if (postSliceState.phase === "validating-milestone" || postSliceState.phase === "completing-milestone") {
|
||||
const { aggregateMilestoneVerification, generateMilestoneSummary } = await import("./mechanical-completion.js");
|
||||
const validation = await aggregateMilestoneVerification(s.basePath, mid);
|
||||
if (validation.verdict !== "failed") {
|
||||
await generateMilestoneSummary(s.basePath, mid);
|
||||
invalidateAllCaches();
|
||||
autoCommitCurrentBranch(s.basePath, "mechanical-milestone-completion", mid);
|
||||
ctx.ui.notify(`Mechanical completion: ${mid} validation + summary written.`, "info");
|
||||
}
|
||||
}
|
||||
}
|
||||
// If !ok, summarizing phase persists → dispatch rule fires as LLM fallback
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
process.stderr.write(`gsd-mechanical: completion failed: ${(err as Error).message}\n`);
|
||||
// Non-fatal — fall through to normal dispatch
|
||||
}
|
||||
}
|
||||
|
||||
// ── Post-unit hooks ──
|
||||
if (s.currentUnit && !s.stepMode) {
|
||||
const hookUnit = checkPostUnitHooks(s.currentUnit.type, s.currentUnit.id, s.basePath);
|
||||
|
|
|
|||
|
|
@ -589,14 +589,18 @@ export async function buildPlanMilestonePrompt(mid: string, midTitle: string, ba
|
|||
const { inlinePriorMilestoneSummary } = await import("./files.js");
|
||||
const priorSummaryInline = await inlinePriorMilestoneSummary(mid, base);
|
||||
if (priorSummaryInline) inlined.push(priorSummaryInline);
|
||||
if (inlineLevel !== "minimal") {
|
||||
const projectInline = await inlineProjectFromDb(base);
|
||||
if (projectInline) inlined.push(projectInline);
|
||||
const requirementsInline = await inlineRequirementsFromDb(base, undefined, inlineLevel);
|
||||
if (requirementsInline) inlined.push(requirementsInline);
|
||||
const decisionsInline = await inlineDecisionsFromDb(base, mid, undefined, inlineLevel);
|
||||
if (decisionsInline) inlined.push(decisionsInline);
|
||||
}
|
||||
// Build source file paths for the planner to read on demand (reduces inlining)
|
||||
const sourcePaths: string[] = [];
|
||||
if (existsSync(resolveGsdRootFile(base, "PROJECT")))
|
||||
sourcePaths.push(`- **Project**: \`${relGsdRootFile("PROJECT")}\``);
|
||||
if (existsSync(resolveGsdRootFile(base, "REQUIREMENTS")))
|
||||
sourcePaths.push(`- **Requirements**: \`${relGsdRootFile("REQUIREMENTS")}\``);
|
||||
if (existsSync(resolveGsdRootFile(base, "DECISIONS")))
|
||||
sourcePaths.push(`- **Decisions**: \`${relGsdRootFile("DECISIONS")}\``);
|
||||
const sourceFilePaths = sourcePaths.length > 0
|
||||
? sourcePaths.join("\n")
|
||||
: "_No project/requirements/decisions files found._";
|
||||
|
||||
const knowledgeInlinePM = await inlineGsdRootFile(base, "knowledge.md", "Project Knowledge");
|
||||
if (knowledgeInlinePM) inlined.push(knowledgeInlinePM);
|
||||
inlined.push(inlineTemplate("roadmap", "Roadmap"));
|
||||
|
|
@ -615,6 +619,7 @@ export async function buildPlanMilestonePrompt(mid: string, midTitle: string, ba
|
|||
|
||||
const outputRelPath = relMilestoneFile(base, mid, "ROADMAP");
|
||||
const secretsOutputPath = join(base, relMilestoneFile(base, mid, "SECRETS"));
|
||||
const researchOutputRelPath = relMilestoneFile(base, mid, "RESEARCH");
|
||||
return loadPrompt("plan-milestone", {
|
||||
workingDirectory: base,
|
||||
milestoneId: mid, milestoneTitle: midTitle,
|
||||
|
|
@ -624,6 +629,9 @@ export async function buildPlanMilestonePrompt(mid: string, midTitle: string, ba
|
|||
outputPath: join(base, outputRelPath),
|
||||
secretsOutputPath,
|
||||
inlinedContext,
|
||||
sourceFilePaths,
|
||||
researchOutputPath: join(base, researchOutputRelPath),
|
||||
...buildSkillDiscoveryVars(),
|
||||
});
|
||||
}
|
||||
|
||||
|
|
@ -686,12 +694,16 @@ export async function buildPlanSlicePrompt(
|
|||
inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
|
||||
const researchInline = await inlineFileOptional(researchPath, researchRel, "Slice Research");
|
||||
if (researchInline) inlined.push(researchInline);
|
||||
if (inlineLevel !== "minimal") {
|
||||
const decisionsInline = await inlineDecisionsFromDb(base, mid, undefined, inlineLevel);
|
||||
if (decisionsInline) inlined.push(decisionsInline);
|
||||
const requirementsInline = await inlineRequirementsFromDb(base, sid, inlineLevel);
|
||||
if (requirementsInline) inlined.push(requirementsInline);
|
||||
}
|
||||
// Build source file paths for the planner to read on demand (reduces inlining)
|
||||
const sliceSourcePaths: string[] = [];
|
||||
if (existsSync(resolveGsdRootFile(base, "REQUIREMENTS")))
|
||||
sliceSourcePaths.push(`- **Requirements**: \`${relGsdRootFile("REQUIREMENTS")}\``);
|
||||
if (existsSync(resolveGsdRootFile(base, "DECISIONS")))
|
||||
sliceSourcePaths.push(`- **Decisions**: \`${relGsdRootFile("DECISIONS")}\``);
|
||||
const sliceSourceFilePaths = sliceSourcePaths.length > 0
|
||||
? sliceSourcePaths.join("\n")
|
||||
: "_No requirements/decisions files found._";
|
||||
|
||||
const knowledgeInlinePS = await inlineGsdRootFile(base, "knowledge.md", "Project Knowledge");
|
||||
if (knowledgeInlinePS) inlined.push(knowledgeInlinePS);
|
||||
inlined.push(inlineTemplate("plan", "Slice Plan"));
|
||||
|
|
@ -726,6 +738,7 @@ export async function buildPlanSlicePrompt(
|
|||
dependencySummaries: depContent,
|
||||
executorContextConstraints,
|
||||
commitInstruction,
|
||||
sourceFilePaths: sliceSourceFilePaths,
|
||||
});
|
||||
}
|
||||
|
||||
|
|
|
|||
430
src/resources/extensions/gsd/mechanical-completion.ts
Normal file
430
src/resources/extensions/gsd/mechanical-completion.ts
Normal file
|
|
@ -0,0 +1,430 @@
|
|||
/**
|
||||
* Mechanical Completion — deterministic post-verification artifact generation.
|
||||
*
|
||||
* Pure functions that aggregate task-level outputs into slice/milestone summaries,
|
||||
* UAT stubs, roadmap checkbox updates, and validation reports. Zero orchestration
|
||||
* dependencies — operates on filesystem paths and parsed structures only.
|
||||
*
|
||||
* ADR-003: replaces LLM-driven complete-slice and validate-milestone units with
|
||||
* mechanical aggregation when the data is sufficient.
|
||||
*/
|
||||
|
||||
import { readFileSync, existsSync, readdirSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { atomicWriteSync } from "./atomic-write.js";
|
||||
import { loadFile, parseSummary } from "./files.js";
|
||||
import { extractMarkdownSection } from "./auto-prompts.js";
|
||||
import {
|
||||
resolveTaskFiles,
|
||||
resolveTaskJsonFiles,
|
||||
resolveTasksDir,
|
||||
resolveSliceFile,
|
||||
resolveSlicePath,
|
||||
resolveMilestoneFile,
|
||||
resolveMilestonePath,
|
||||
resolveGsdRootFile,
|
||||
} from "./paths.js";
|
||||
import type { Summary, SummaryFrontmatter } from "./types.js";
|
||||
import type { EvidenceJSON } from "./verification-evidence.js";
|
||||
|
||||
// ─── Slice Completion ────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Mechanically complete a slice by aggregating task summaries into:
|
||||
* - S##-SUMMARY.md (aggregated frontmatter + task one-liners)
|
||||
* - S##-UAT.md (extracted from plan Verification section)
|
||||
* - Roadmap checkbox [x] update
|
||||
*
|
||||
* Returns true if completion succeeded, false if data is insufficient
|
||||
* (serves as quality gate — caller falls back to LLM completion).
|
||||
*/
|
||||
export async function mechanicalSliceCompletion(
|
||||
base: string, mid: string, sid: string,
|
||||
): Promise<boolean> {
|
||||
const tDir = resolveTasksDir(base, mid, sid);
|
||||
if (!tDir) return false;
|
||||
|
||||
// Read all task summaries
|
||||
const summaryFiles = resolveTaskFiles(tDir, "SUMMARY");
|
||||
if (summaryFiles.length === 0) return false;
|
||||
|
||||
const taskSummaries: Array<{ taskId: string; summary: Summary }> = [];
|
||||
for (const file of summaryFiles) {
|
||||
const content = readFileSync(join(tDir, file), "utf-8");
|
||||
if (!content.trim()) continue;
|
||||
const summary = parseSummary(content);
|
||||
const taskId = file.match(/^(T\d+)/)?.[1] ?? file;
|
||||
taskSummaries.push({ taskId, summary });
|
||||
}
|
||||
|
||||
if (taskSummaries.length === 0) return false;
|
||||
|
||||
// Quality gate: multi-task slices need substantive summaries
|
||||
if (taskSummaries.length > 1) {
|
||||
const totalContent = taskSummaries
|
||||
.map(ts => ts.summary.whatHappened || ts.summary.oneLiner || "")
|
||||
.join("");
|
||||
if (totalContent.length < 200) return false;
|
||||
}
|
||||
|
||||
// Aggregate frontmatter
|
||||
const aggregated = aggregateFrontmatter(taskSummaries.map(ts => ts.summary.frontmatter));
|
||||
|
||||
// Build SUMMARY.md
|
||||
const summaryLines: string[] = [
|
||||
"---",
|
||||
`id: ${sid}`,
|
||||
`parent: ${mid}`,
|
||||
`milestone: ${mid}`,
|
||||
];
|
||||
if (aggregated.provides.length > 0)
|
||||
summaryLines.push(`provides:\n${aggregated.provides.map(p => ` - ${p}`).join("\n")}`);
|
||||
if (aggregated.key_files.length > 0)
|
||||
summaryLines.push(`key_files:\n${aggregated.key_files.map(f => ` - ${f}`).join("\n")}`);
|
||||
if (aggregated.key_decisions.length > 0)
|
||||
summaryLines.push(`key_decisions:\n${aggregated.key_decisions.map(d => ` - ${d}`).join("\n")}`);
|
||||
if (aggregated.patterns_established.length > 0)
|
||||
summaryLines.push(`patterns_established:\n${aggregated.patterns_established.map(p => ` - ${p}`).join("\n")}`);
|
||||
if (aggregated.affects.length > 0)
|
||||
summaryLines.push(`affects:\n${aggregated.affects.map(a => ` - ${a}`).join("\n")}`);
|
||||
if (aggregated.observability_surfaces.length > 0)
|
||||
summaryLines.push(`observability_surfaces:\n${aggregated.observability_surfaces.map(o => ` - ${o}`).join("\n")}`);
|
||||
const allPassed = taskSummaries.every(ts => ts.summary.frontmatter.verification_result === "passed");
|
||||
summaryLines.push(`verification_result: ${allPassed ? "passed" : "mixed"}`);
|
||||
summaryLines.push(`completed_at: ${new Date().toISOString()}`);
|
||||
summaryLines.push("---");
|
||||
summaryLines.push("");
|
||||
summaryLines.push(`# ${sid}: Slice Summary`);
|
||||
summaryLines.push("");
|
||||
|
||||
// Task one-liners
|
||||
for (const { taskId, summary } of taskSummaries) {
|
||||
const line = summary.oneLiner || summary.title || taskId;
|
||||
summaryLines.push(`- **${taskId}**: ${line}`);
|
||||
}
|
||||
summaryLines.push("");
|
||||
|
||||
const sDir = resolveSlicePath(base, mid, sid);
|
||||
if (!sDir) return false;
|
||||
|
||||
const summaryPath = join(sDir, `${sid}-SUMMARY.md`);
|
||||
atomicWriteSync(summaryPath, summaryLines.join("\n"));
|
||||
process.stderr.write(`gsd-mechanical: wrote ${summaryPath}\n`);
|
||||
|
||||
// Build UAT.md from plan's Verification section
|
||||
const planPath = resolveSliceFile(base, mid, sid, "PLAN");
|
||||
if (planPath) {
|
||||
const planContent = readFileSync(planPath, "utf-8");
|
||||
const verification = extractMarkdownSection(planContent, "Verification");
|
||||
if (verification) {
|
||||
const uatContent = [
|
||||
"---",
|
||||
`id: ${sid}`,
|
||||
`parent: ${mid}`,
|
||||
"type: artifact-driven",
|
||||
"---",
|
||||
"",
|
||||
`# ${sid}: UAT`,
|
||||
"",
|
||||
verification,
|
||||
"",
|
||||
].join("\n");
|
||||
const uatPath = join(sDir, `${sid}-UAT.md`);
|
||||
atomicWriteSync(uatPath, uatContent);
|
||||
process.stderr.write(`gsd-mechanical: wrote ${uatPath}\n`);
|
||||
}
|
||||
}
|
||||
|
||||
// Mark slice [x] in ROADMAP
|
||||
await markSliceInRoadmap(base, mid, sid);
|
||||
|
||||
// Append new decisions if any
|
||||
await appendNewDecisions(base, taskSummaries.map(ts => ts.summary));
|
||||
|
||||
// Update requirements if all passed
|
||||
if (allPassed) {
|
||||
await mechanicalRequirementsUpdate(base, mid, sid, taskSummaries.map(ts => ts.summary));
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
// ─── Requirements Update ─────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Conservative requirements update: mark requirements Validated only if
|
||||
* all tasks' verification passed.
|
||||
*/
|
||||
export async function mechanicalRequirementsUpdate(
|
||||
_base: string, _mid: string, _sid: string, _taskSummaries: Summary[],
|
||||
): Promise<void> {
|
||||
// Conservative: requirements validation requires human or LLM judgment
|
||||
// about whether the requirement is truly met. Mechanical completion only
|
||||
// marks the slice done — requirement status updates are left to the
|
||||
// existing validation pipeline.
|
||||
}
|
||||
|
||||
// ─── Decision Aggregation ────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Collect key_decisions from task summaries, deduplicate against existing
|
||||
* DECISIONS.md, and append new ones.
|
||||
*/
|
||||
export async function appendNewDecisions(
|
||||
base: string, taskSummaries: Summary[],
|
||||
): Promise<void> {
|
||||
const allDecisions = taskSummaries.flatMap(s => s.frontmatter.key_decisions);
|
||||
if (allDecisions.length === 0) return;
|
||||
|
||||
const decisionsPath = resolveGsdRootFile(base, "DECISIONS");
|
||||
const existing = existsSync(decisionsPath)
|
||||
? readFileSync(decisionsPath, "utf-8")
|
||||
: "";
|
||||
|
||||
// Deduplicate — skip decisions whose text already appears in the file
|
||||
const newDecisions = allDecisions.filter(d =>
|
||||
d.trim() && !existing.includes(d.trim()),
|
||||
);
|
||||
if (newDecisions.length === 0) return;
|
||||
|
||||
const entries = newDecisions
|
||||
.map(d => `- ${d} _(auto-aggregated from task summaries)_`)
|
||||
.join("\n");
|
||||
|
||||
const updated = existing.trimEnd() + "\n\n### Auto-aggregated Decisions\n\n" + entries + "\n";
|
||||
atomicWriteSync(decisionsPath, updated);
|
||||
process.stderr.write(`gsd-mechanical: appended ${newDecisions.length} decision(s) to DECISIONS.md\n`);
|
||||
}
|
||||
|
||||
// ─── Milestone Verification ──────────────────────────────────────────────────
|
||||
|
||||
export interface MilestoneVerificationResult {
|
||||
verdict: "passed" | "failed" | "mixed";
|
||||
checks: EvidenceJSON[];
|
||||
uatResults: string[];
|
||||
markdown: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Aggregate T##-VERIFY.json files and S##-UAT-RESULT.md files across all
|
||||
* slices in a milestone to produce VALIDATION.md.
|
||||
*/
|
||||
export async function aggregateMilestoneVerification(
|
||||
base: string, mid: string,
|
||||
): Promise<MilestoneVerificationResult> {
|
||||
const mDir = resolveMilestonePath(base, mid);
|
||||
if (!mDir) return { verdict: "failed", checks: [], uatResults: [], markdown: "" };
|
||||
|
||||
const allChecks: EvidenceJSON[] = [];
|
||||
const allUatResults: string[] = [];
|
||||
|
||||
// Scan all slices
|
||||
const slicesDir = join(mDir, "slices");
|
||||
if (!existsSync(slicesDir)) return { verdict: "failed", checks: [], uatResults: [], markdown: "" };
|
||||
|
||||
const sliceDirs = readdirSyncSafe(slicesDir).filter(name => /^S\d+/i.test(name)).sort();
|
||||
|
||||
for (const sliceName of sliceDirs) {
|
||||
const sid = sliceName.match(/^(S\d+)/i)?.[1] ?? sliceName;
|
||||
const tDir = resolveTasksDir(base, mid, sid);
|
||||
if (tDir) {
|
||||
const verifyFiles = resolveTaskJsonFiles(tDir, "VERIFY");
|
||||
for (const vf of verifyFiles) {
|
||||
try {
|
||||
const content = readFileSync(join(tDir, vf), "utf-8");
|
||||
const evidence = JSON.parse(content) as EvidenceJSON;
|
||||
allChecks.push(evidence);
|
||||
} catch {
|
||||
// Skip malformed JSON
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check for UAT result
|
||||
const uatResultPath = resolveSliceFile(base, mid, sid, "UAT-RESULT");
|
||||
if (uatResultPath) {
|
||||
try {
|
||||
const uatContent = readFileSync(uatResultPath, "utf-8");
|
||||
allUatResults.push(`### ${sid}\n\n${uatContent}`);
|
||||
} catch {
|
||||
// Non-fatal
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Determine verdict
|
||||
const allPassed = allChecks.length > 0 && allChecks.every(c => c.passed);
|
||||
const anyFailed = allChecks.some(c => !c.passed);
|
||||
const verdict: "passed" | "failed" | "mixed" = allPassed
|
||||
? "passed"
|
||||
: anyFailed
|
||||
? (allChecks.some(c => c.passed) ? "mixed" : "failed")
|
||||
: "passed"; // No checks = vacuously passed
|
||||
|
||||
// Build VALIDATION.md
|
||||
const mdLines: string[] = [
|
||||
"---",
|
||||
`milestone: ${mid}`,
|
||||
`verdict: ${verdict}`,
|
||||
"remediation_round: 0",
|
||||
`validated_at: ${new Date().toISOString()}`,
|
||||
"---",
|
||||
"",
|
||||
`# ${mid}: Milestone Validation`,
|
||||
"",
|
||||
`**Verdict:** ${verdict}`,
|
||||
"",
|
||||
"## Verification Results",
|
||||
"",
|
||||
];
|
||||
|
||||
if (allChecks.length === 0) {
|
||||
mdLines.push("_No verification evidence found._");
|
||||
} else {
|
||||
mdLines.push("| Task | Passed | Checks | Failed |");
|
||||
mdLines.push("|------|--------|--------|--------|");
|
||||
for (const check of allChecks) {
|
||||
const failedCount = check.checks.filter(c => c.verdict === "fail").length;
|
||||
mdLines.push(
|
||||
`| ${check.taskId} | ${check.passed ? "yes" : "no"} | ${check.checks.length} | ${failedCount} |`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (allUatResults.length > 0) {
|
||||
mdLines.push("");
|
||||
mdLines.push("## UAT Results");
|
||||
mdLines.push("");
|
||||
mdLines.push(...allUatResults);
|
||||
}
|
||||
|
||||
mdLines.push("");
|
||||
|
||||
const markdown = mdLines.join("\n");
|
||||
|
||||
// Write VALIDATION.md
|
||||
const validationPath = join(mDir, `${mid}-VALIDATION.md`);
|
||||
atomicWriteSync(validationPath, markdown);
|
||||
process.stderr.write(`gsd-mechanical: wrote ${validationPath}\n`);
|
||||
|
||||
return { verdict, checks: allChecks, uatResults: allUatResults, markdown };
|
||||
}
|
||||
|
||||
// ─── Milestone Summary ──────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Read all S##-SUMMARY.md files and produce M##-SUMMARY.md.
|
||||
*/
|
||||
export async function generateMilestoneSummary(
|
||||
base: string, mid: string,
|
||||
): Promise<string> {
|
||||
const mDir = resolveMilestonePath(base, mid);
|
||||
if (!mDir) return "";
|
||||
|
||||
const slicesDir = join(mDir, "slices");
|
||||
if (!existsSync(slicesDir)) return "";
|
||||
|
||||
const sliceDirs = readdirSyncSafe(slicesDir).filter(name => /^S\d+/i.test(name)).sort();
|
||||
|
||||
const aggregatedProvides: string[] = [];
|
||||
const aggregatedKeyFiles: string[] = [];
|
||||
const aggregatedKeyDecisions: string[] = [];
|
||||
const aggregatedPatterns: string[] = [];
|
||||
const sliceOneLinerList: string[] = [];
|
||||
|
||||
for (const sliceName of sliceDirs) {
|
||||
const sid = sliceName.match(/^(S\d+)/i)?.[1] ?? sliceName;
|
||||
const summaryPath = resolveSliceFile(base, mid, sid, "SUMMARY");
|
||||
if (!summaryPath) continue;
|
||||
|
||||
try {
|
||||
const content = readFileSync(summaryPath, "utf-8");
|
||||
const summary = parseSummary(content);
|
||||
aggregatedProvides.push(...summary.frontmatter.provides);
|
||||
aggregatedKeyFiles.push(...summary.frontmatter.key_files);
|
||||
aggregatedKeyDecisions.push(...summary.frontmatter.key_decisions);
|
||||
aggregatedPatterns.push(...summary.frontmatter.patterns_established);
|
||||
sliceOneLinerList.push(`- **${sid}**: ${summary.oneLiner || summary.title || sid}`);
|
||||
} catch {
|
||||
sliceOneLinerList.push(`- **${sid}**: _(summary unavailable)_`);
|
||||
}
|
||||
}
|
||||
|
||||
const mdLines: string[] = [
|
||||
"---",
|
||||
`id: ${mid}`,
|
||||
];
|
||||
if (dedup(aggregatedProvides).length > 0)
|
||||
mdLines.push(`provides:\n${dedup(aggregatedProvides).map(p => ` - ${p}`).join("\n")}`);
|
||||
if (dedup(aggregatedKeyFiles).length > 0)
|
||||
mdLines.push(`key_files:\n${dedup(aggregatedKeyFiles).map(f => ` - ${f}`).join("\n")}`);
|
||||
if (dedup(aggregatedKeyDecisions).length > 0)
|
||||
mdLines.push(`key_decisions:\n${dedup(aggregatedKeyDecisions).map(d => ` - ${d}`).join("\n")}`);
|
||||
if (dedup(aggregatedPatterns).length > 0)
|
||||
mdLines.push(`patterns_established:\n${dedup(aggregatedPatterns).map(p => ` - ${p}`).join("\n")}`);
|
||||
mdLines.push(`completed_at: ${new Date().toISOString()}`);
|
||||
mdLines.push("---");
|
||||
mdLines.push("");
|
||||
mdLines.push(`# ${mid}: Milestone Summary`);
|
||||
mdLines.push("");
|
||||
mdLines.push("## Slices");
|
||||
mdLines.push("");
|
||||
mdLines.push(...sliceOneLinerList);
|
||||
mdLines.push("");
|
||||
|
||||
const content = mdLines.join("\n");
|
||||
|
||||
// Write M##-SUMMARY.md
|
||||
const summaryPath = join(mDir, `${mid}-SUMMARY.md`);
|
||||
atomicWriteSync(summaryPath, content);
|
||||
process.stderr.write(`gsd-mechanical: wrote ${summaryPath}\n`);
|
||||
|
||||
return content;
|
||||
}
|
||||
|
||||
// ─── Helpers ─────────────────────────────────────────────────────────────────
|
||||
|
||||
function aggregateFrontmatter(fms: SummaryFrontmatter[]): {
|
||||
provides: string[];
|
||||
key_files: string[];
|
||||
key_decisions: string[];
|
||||
patterns_established: string[];
|
||||
affects: string[];
|
||||
observability_surfaces: string[];
|
||||
} {
|
||||
return {
|
||||
provides: dedup(fms.flatMap(f => f.provides)),
|
||||
key_files: dedup(fms.flatMap(f => f.key_files)),
|
||||
key_decisions: dedup(fms.flatMap(f => f.key_decisions)),
|
||||
patterns_established: dedup(fms.flatMap(f => f.patterns_established)),
|
||||
affects: dedup(fms.flatMap(f => f.affects)),
|
||||
observability_surfaces: dedup(fms.flatMap(f => f.observability_surfaces)),
|
||||
};
|
||||
}
|
||||
|
||||
function dedup(arr: string[]): string[] {
|
||||
return [...new Set(arr.filter(s => s.trim()))];
|
||||
}
|
||||
|
||||
async function markSliceInRoadmap(base: string, mid: string, sid: string): Promise<void> {
|
||||
const roadmapPath = resolveMilestoneFile(base, mid, "ROADMAP");
|
||||
if (!roadmapPath) return;
|
||||
const content = await loadFile(roadmapPath);
|
||||
if (!content) return;
|
||||
const updated = content.replace(
|
||||
new RegExp(`^(\\s*-\\s+)\\[ \\]\\s+\\*\\*${sid}:`, "m"),
|
||||
`$1[x] **${sid}:`,
|
||||
);
|
||||
if (updated !== content) {
|
||||
atomicWriteSync(roadmapPath, updated);
|
||||
process.stderr.write(`gsd-mechanical: marked ${sid} done in ROADMAP\n`);
|
||||
}
|
||||
}
|
||||
|
||||
function readdirSyncSafe(dir: string): string[] {
|
||||
try {
|
||||
return readdirSync(dir);
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
|
@ -236,6 +236,23 @@ export function resolveTaskFiles(tasksDir: string, suffix: string): string[] {
|
|||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Find all task JSON files matching a pattern in a tasks directory.
|
||||
* Returns sorted file names matching T##-SUFFIX.json or legacy T##-*-SUFFIX.json
|
||||
*/
|
||||
export function resolveTaskJsonFiles(tasksDir: string, suffix: string): string[] {
|
||||
if (!existsSync(tasksDir)) return [];
|
||||
try {
|
||||
const currentPattern = new RegExp(`^T\\d+-${suffix}\\.json$`, "i");
|
||||
const legacyPattern = new RegExp(`^T\\d+-.*-${suffix}\\.json$`, "i");
|
||||
return cachedReaddir(tasksDir)
|
||||
.filter(f => currentPattern.test(f) || legacyPattern.test(f))
|
||||
.sort();
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Full Path Builders ────────────────────────────────────────────────────
|
||||
|
||||
export const GSD_ROOT_FILES = {
|
||||
|
|
|
|||
|
|
@ -254,13 +254,19 @@ export function resolveProfileDefaults(profile: TokenProfile): Partial<GSDPrefer
|
|||
subagent: "claude-sonnet-4-5-20250514",
|
||||
},
|
||||
phases: {
|
||||
skip_research: true,
|
||||
skip_reassess: true,
|
||||
skip_slice_research: true,
|
||||
},
|
||||
};
|
||||
case "quality":
|
||||
return {
|
||||
models: {},
|
||||
phases: {},
|
||||
phases: {
|
||||
skip_research: true,
|
||||
skip_slice_research: true,
|
||||
skip_reassess: true,
|
||||
},
|
||||
};
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -172,9 +172,10 @@ export function validatePreferences(preferences: GSDPreferences): {
|
|||
if (p.skip_reassess !== undefined) validatedPhases.skip_reassess = !!p.skip_reassess;
|
||||
if (p.skip_slice_research !== undefined) validatedPhases.skip_slice_research = !!p.skip_slice_research;
|
||||
if (p.skip_milestone_validation !== undefined) validatedPhases.skip_milestone_validation = !!p.skip_milestone_validation;
|
||||
if (p.reassess_after_slice !== undefined) validatedPhases.reassess_after_slice = !!p.reassess_after_slice;
|
||||
if ((p as any).require_slice_discussion !== undefined) (validatedPhases as any).require_slice_discussion = !!(p as any).require_slice_discussion;
|
||||
// Warn on unknown phase keys
|
||||
const knownPhaseKeys = new Set(["skip_research", "skip_reassess", "skip_slice_research", "skip_milestone_validation", "require_slice_discussion"]);
|
||||
const knownPhaseKeys = new Set(["skip_research", "skip_reassess", "skip_slice_research", "skip_milestone_validation", "reassess_after_slice", "require_slice_discussion"]);
|
||||
for (const key of Object.keys(p)) {
|
||||
if (!knownPhaseKeys.has(key)) {
|
||||
warnings.push(`unknown phases key "${key}" — ignored`);
|
||||
|
|
|
|||
|
|
@ -12,9 +12,33 @@ All relevant context has been preloaded below — start working immediately with
|
|||
|
||||
## Your Role in the Pipeline
|
||||
|
||||
A **researcher agent** already explored the codebase and documented findings in the milestone research doc (inlined above, if present). It identified key files, technology choices, constraints, and risks. **Trust the research.** Your job is strategic decomposition — turning findings into an ordered set of demoable slices — not re-exploration. Don't read code files the research already summarized unless something is ambiguous or missing.
|
||||
You are the first deep look at this milestone. You have full tool access — explore the codebase, look up docs, investigate technology choices. Your job is to understand the landscape and then strategically decompose the work into demoable slices.
|
||||
|
||||
After you finish, each slice goes through its own research → plan → execute cycle. Slice researchers dive deeper into the specific area. Slice planners decompose into tasks. Executors build each task. Your roadmap sets the strategic frame for all of them.
|
||||
After you finish, each slice goes through its own plan → execute cycle. Slice planners decompose into tasks. Executors build each task. Your roadmap sets the strategic frame for all of them.
|
||||
|
||||
### Explore First, Then Decompose
|
||||
|
||||
Before decomposing, build your understanding:
|
||||
|
||||
1. **Codebase exploration.** For small/familiar codebases, use `rg`, `find`, and targeted reads. For large or unfamiliar codebases, use `scout` to build a broad map efficiently before diving in.
|
||||
2. **Library docs.** Use `resolve_library` / `get_library_docs` for unfamiliar libraries — skip this for libraries already used in the codebase.
|
||||
3. **Skill Discovery ({{skillDiscoveryMode}}):**{{skillDiscoveryInstructions}}
|
||||
4. **Requirements analysis.** If `.gsd/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors.
|
||||
|
||||
### Strategic Questions to Answer
|
||||
|
||||
- What should be proven first?
|
||||
- What existing patterns should be reused?
|
||||
- What boundary contracts matter?
|
||||
- What constraints does the existing codebase impose?
|
||||
- Are there known failure modes that should shape slice ordering?
|
||||
- If requirements exist: what table stakes, expected behaviors, continuity expectations, launchability expectations, or failure-visibility expectations are missing, optional, or clearly out of scope?
|
||||
|
||||
### Source Files
|
||||
|
||||
{{sourceFilePaths}}
|
||||
|
||||
If milestone research exists (inlined above), trust those findings and skip redundant exploration. If findings are significant and no research file exists yet, write `{{researchOutputPath}}`.
|
||||
|
||||
Narrate your decomposition reasoning — why you're grouping work this way, what risks are driving the order, what verification strategy you're choosing and why. Use complete sentences rather than planner shorthand or fragmentary notes.
|
||||
|
||||
|
|
|
|||
|
|
@ -18,7 +18,21 @@ Pay particular attention to **Forward Intelligence** sections — they contain h
|
|||
|
||||
## Your Role in the Pipeline
|
||||
|
||||
A **researcher agent** already explored the codebase and documented findings in the slice research doc (inlined above, if present). It identified key files, build order, constraints, and verification approach. **Trust the research.** Your job is decomposition — turning findings into executable tasks — not re-exploration. Don't read code files the research already summarized unless something is ambiguous or missing from its findings.
|
||||
You have full tool access. Before decomposing, explore the relevant code to ground your plan in reality.
|
||||
|
||||
### Verify Roadmap Assumptions
|
||||
|
||||
Check prior slice summaries (inlined above as dependency summaries, if present). If prior slices discovered constraints, changed approaches, or flagged fragility, adjust your plan accordingly. The roadmap description may be stale — verify it against the current codebase state.
|
||||
|
||||
### Explore Slice Scope
|
||||
|
||||
Read the code files relevant to this slice. Confirm the roadmap's description of what exists, what needs to change, and what boundaries apply. Use `rg`, `find`, and targeted reads.
|
||||
|
||||
### Source Files
|
||||
|
||||
{{sourceFilePaths}}
|
||||
|
||||
If slice research exists (inlined above), trust those findings and skip redundant exploration.
|
||||
|
||||
After you finish, **executor agents** implement each task in isolated fresh context windows. They see only their task plan, the slice plan excerpt (goal/demo/verification), and compressed summaries of prior tasks. They do not see the research doc, the roadmap, or REQUIREMENTS.md. Everything an executor needs must be in the task plan itself — file paths, specific steps, expected inputs and outputs.
|
||||
|
||||
|
|
|
|||
356
src/resources/extensions/gsd/tests/mechanical-completion.test.ts
Normal file
356
src/resources/extensions/gsd/tests/mechanical-completion.test.ts
Normal file
|
|
@ -0,0 +1,356 @@
|
|||
/**
|
||||
* Mechanical Completion — unit tests (ADR-003).
|
||||
*
|
||||
* Tests deterministic slice/milestone completion using fixture data.
|
||||
* Uses node:test + node:assert for consistency with token-profile.test.ts.
|
||||
*/
|
||||
|
||||
import test from "node:test";
|
||||
import assert from "node:assert/strict";
|
||||
import { mkdirSync, writeFileSync, readFileSync, rmSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir } from "node:os";
|
||||
import { randomBytes } from "node:crypto";
|
||||
|
||||
// ─── Fixture Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
function createTmpBase(): string {
|
||||
const base = join(tmpdir(), `gsd-mech-test-${randomBytes(4).toString("hex")}`);
|
||||
mkdirSync(base, { recursive: true });
|
||||
return base;
|
||||
}
|
||||
|
||||
function scaffold(base: string, mid: string, sid: string, taskSummaries: Array<{ tid: string; content: string }>) {
|
||||
const gsdRoot = join(base, ".gsd");
|
||||
const mDir = join(gsdRoot, "milestones", mid);
|
||||
const sDir = join(mDir, "slices", sid);
|
||||
const tDir = join(sDir, "tasks");
|
||||
mkdirSync(tDir, { recursive: true });
|
||||
|
||||
for (const { tid, content } of taskSummaries) {
|
||||
writeFileSync(join(tDir, `${tid}-SUMMARY.md`), content, "utf-8");
|
||||
}
|
||||
|
||||
return { gsdRoot, mDir, sDir, tDir };
|
||||
}
|
||||
|
||||
function makeTaskSummary(tid: string, opts: {
|
||||
oneLiner?: string;
|
||||
provides?: string[];
|
||||
key_files?: string[];
|
||||
key_decisions?: string[];
|
||||
verification_result?: string;
|
||||
}): string {
|
||||
const lines: string[] = [
|
||||
"---",
|
||||
`id: ${tid}`,
|
||||
`parent: S01`,
|
||||
`milestone: M001`,
|
||||
];
|
||||
if (opts.provides?.length) lines.push(`provides:\n${opts.provides.map(p => ` - ${p}`).join("\n")}`);
|
||||
if (opts.key_files?.length) lines.push(`key_files:\n${opts.key_files.map(f => ` - ${f}`).join("\n")}`);
|
||||
if (opts.key_decisions?.length) lines.push(`key_decisions:\n${opts.key_decisions.map(d => ` - ${d}`).join("\n")}`);
|
||||
lines.push(`verification_result: ${opts.verification_result ?? "passed"}`);
|
||||
lines.push("---");
|
||||
lines.push("");
|
||||
lines.push(`# ${tid}: Test Task`);
|
||||
lines.push("");
|
||||
if (opts.oneLiner) lines.push(`**${opts.oneLiner}**`);
|
||||
lines.push("");
|
||||
lines.push("## What Happened");
|
||||
lines.push("");
|
||||
lines.push(`Implemented the feature described in ${tid}. This was a significant change that modified multiple files across the codebase to support the new functionality.`);
|
||||
lines.push("");
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
||||
// ─── Source-level structural tests ────────────────────────────────────────────
|
||||
|
||||
const mechanicalSrc = readFileSync(
|
||||
join(import.meta.dirname!, "..", "mechanical-completion.ts"),
|
||||
"utf-8",
|
||||
);
|
||||
|
||||
test("mechanical-completion: exports mechanicalSliceCompletion", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("export async function mechanicalSliceCompletion"),
|
||||
"should export mechanicalSliceCompletion",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: exports aggregateMilestoneVerification", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("export async function aggregateMilestoneVerification"),
|
||||
"should export aggregateMilestoneVerification",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: exports generateMilestoneSummary", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("export async function generateMilestoneSummary"),
|
||||
"should export generateMilestoneSummary",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: exports appendNewDecisions", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("export async function appendNewDecisions"),
|
||||
"should export appendNewDecisions",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: uses atomicWriteSync for file writes", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("atomicWriteSync"),
|
||||
"should use atomicWriteSync for safe file writes",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: quality gate checks summary length for multi-task slices", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("totalContent.length < 200"),
|
||||
"should have quality gate for summary content length",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: marks slice [x] in roadmap", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("markSliceInRoadmap"),
|
||||
"should mark slice done in roadmap",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: aggregates VERIFY.json files for milestone validation", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("resolveTaskJsonFiles") && mechanicalSrc.includes("VERIFY"),
|
||||
"should read VERIFY.json files for milestone validation",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: deduplicates decisions against existing DECISIONS.md", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("existing.includes(d.trim())"),
|
||||
"should deduplicate decisions against existing content",
|
||||
);
|
||||
});
|
||||
|
||||
test("mechanical-completion: produces VALIDATION.md with verdict frontmatter", () => {
|
||||
assert.ok(
|
||||
mechanicalSrc.includes("verdict:") && mechanicalSrc.includes("remediation_round: 0"),
|
||||
"VALIDATION.md should have verdict and remediation_round frontmatter",
|
||||
);
|
||||
});
|
||||
|
||||
// ─── Integration tests with fixture data ──────────────────────────────────────
|
||||
|
||||
test("mechanical: slice completion with 2 task summaries produces SUMMARY.md", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const mid = "M001";
|
||||
const sid = "S01";
|
||||
|
||||
// Scaffold task summaries
|
||||
scaffold(base, mid, sid, [
|
||||
{
|
||||
tid: "T01",
|
||||
content: makeTaskSummary("T01", {
|
||||
oneLiner: "Set up project structure",
|
||||
provides: ["project-scaffold"],
|
||||
key_files: ["src/index.ts", "package.json"],
|
||||
verification_result: "passed",
|
||||
}),
|
||||
},
|
||||
{
|
||||
tid: "T02",
|
||||
content: makeTaskSummary("T02", {
|
||||
oneLiner: "Add core API endpoints",
|
||||
provides: ["api-endpoints"],
|
||||
key_files: ["src/api.ts"],
|
||||
key_decisions: ["Used Express over Fastify"],
|
||||
verification_result: "passed",
|
||||
}),
|
||||
},
|
||||
]);
|
||||
|
||||
// Write a roadmap with the slice unchecked
|
||||
const roadmapPath = join(base, ".gsd", "milestones", mid, `${mid}-ROADMAP.md`);
|
||||
writeFileSync(roadmapPath, `# Roadmap\n\n- [ ] **${sid}: First Slice**\n`, "utf-8");
|
||||
|
||||
// Write a slice plan with Verification section
|
||||
const planPath = join(base, ".gsd", "milestones", mid, "slices", sid, `${sid}-PLAN.md`);
|
||||
writeFileSync(planPath, `# Plan\n\n## Verification\n\n- Run \`npm test\`\n- Check output\n`, "utf-8");
|
||||
|
||||
// Dynamic import to get the actual module
|
||||
const { mechanicalSliceCompletion } = await import("../mechanical-completion.js");
|
||||
const ok = await mechanicalSliceCompletion(base, mid, sid);
|
||||
|
||||
assert.ok(ok, "should return true for valid slice completion");
|
||||
|
||||
// Check SUMMARY.md was written
|
||||
const summaryPath = join(base, ".gsd", "milestones", mid, "slices", sid, `${sid}-SUMMARY.md`);
|
||||
assert.ok(existsSync(summaryPath), "SUMMARY.md should exist");
|
||||
|
||||
const summaryContent = readFileSync(summaryPath, "utf-8");
|
||||
assert.ok(summaryContent.includes("T01"), "summary should reference T01");
|
||||
assert.ok(summaryContent.includes("T02"), "summary should reference T02");
|
||||
assert.ok(summaryContent.includes("verification_result: passed"), "should have passed verification");
|
||||
|
||||
// Check roadmap was updated
|
||||
const updatedRoadmap = readFileSync(roadmapPath, "utf-8");
|
||||
assert.ok(updatedRoadmap.includes("[x]"), "roadmap should have [x] checkbox");
|
||||
|
||||
// Check UAT was written
|
||||
const uatPath = join(base, ".gsd", "milestones", mid, "slices", sid, `${sid}-UAT.md`);
|
||||
assert.ok(existsSync(uatPath), "UAT.md should exist");
|
||||
const uatContent = readFileSync(uatPath, "utf-8");
|
||||
assert.ok(uatContent.includes("npm test"), "UAT should contain verification content");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("mechanical: returns false for empty task summaries", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const mid = "M001";
|
||||
const sid = "S01";
|
||||
scaffold(base, mid, sid, []);
|
||||
|
||||
const { mechanicalSliceCompletion } = await import("../mechanical-completion.js");
|
||||
const ok = await mechanicalSliceCompletion(base, mid, sid);
|
||||
assert.ok(!ok, "should return false when no summaries exist");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("mechanical: returns false for insufficient summary content in multi-task slice", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const mid = "M001";
|
||||
const sid = "S01";
|
||||
|
||||
// Two tasks but with very short content (under 200 chars)
|
||||
scaffold(base, mid, sid, [
|
||||
{ tid: "T01", content: "---\nid: T01\nparent: S01\nmilestone: M001\n---\n\n# T01: A\n\n**Short**\n" },
|
||||
{ tid: "T02", content: "---\nid: T02\nparent: S01\nmilestone: M001\n---\n\n# T02: B\n\n**Brief**\n" },
|
||||
]);
|
||||
|
||||
const { mechanicalSliceCompletion } = await import("../mechanical-completion.js");
|
||||
const ok = await mechanicalSliceCompletion(base, mid, sid);
|
||||
assert.ok(!ok, "should return false when summaries are too short");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("mechanical: milestone verification aggregates VERIFY.json files", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const mid = "M001";
|
||||
const sid = "S01";
|
||||
const { tDir } = scaffold(base, mid, sid, []);
|
||||
|
||||
// Write VERIFY.json files
|
||||
const evidence = {
|
||||
schemaVersion: 1,
|
||||
taskId: "T01",
|
||||
unitId: "M001/S01/T01",
|
||||
timestamp: Date.now(),
|
||||
passed: true,
|
||||
discoverySource: "plan",
|
||||
checks: [
|
||||
{ command: "npm test", exitCode: 0, durationMs: 1500, verdict: "pass", blocking: true },
|
||||
],
|
||||
};
|
||||
writeFileSync(join(tDir, "T01-VERIFY.json"), JSON.stringify(evidence), "utf-8");
|
||||
|
||||
const evidence2 = { ...evidence, taskId: "T02", passed: false, checks: [
|
||||
{ command: "npm test", exitCode: 1, durationMs: 500, verdict: "fail", blocking: true },
|
||||
]};
|
||||
writeFileSync(join(tDir, "T02-VERIFY.json"), JSON.stringify(evidence2), "utf-8");
|
||||
|
||||
const { aggregateMilestoneVerification } = await import("../mechanical-completion.js");
|
||||
const result = await aggregateMilestoneVerification(base, mid);
|
||||
|
||||
assert.equal(result.verdict, "mixed", "should be mixed when some pass and some fail");
|
||||
assert.equal(result.checks.length, 2, "should have 2 checks");
|
||||
|
||||
// Check VALIDATION.md was written
|
||||
const validationPath = join(base, ".gsd", "milestones", mid, `${mid}-VALIDATION.md`);
|
||||
assert.ok(existsSync(validationPath), "VALIDATION.md should exist");
|
||||
const validationContent = readFileSync(validationPath, "utf-8");
|
||||
assert.ok(validationContent.includes("verdict: mixed"), "should have mixed verdict in frontmatter");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("mechanical: milestone summary aggregates slice summaries", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const mid = "M001";
|
||||
|
||||
// Create two slices with summaries
|
||||
for (const sid of ["S01", "S02"]) {
|
||||
const sDir = join(base, ".gsd", "milestones", mid, "slices", sid);
|
||||
mkdirSync(sDir, { recursive: true });
|
||||
writeFileSync(
|
||||
join(sDir, `${sid}-SUMMARY.md`),
|
||||
`---\nid: ${sid}\nprovides:\n - feature-${sid.toLowerCase()}\nkey_files:\n - src/${sid.toLowerCase()}.ts\n---\n\n# ${sid}: Slice\n\n**${sid} implemented**\n`,
|
||||
"utf-8",
|
||||
);
|
||||
}
|
||||
|
||||
const { generateMilestoneSummary } = await import("../mechanical-completion.js");
|
||||
const content = await generateMilestoneSummary(base, mid);
|
||||
|
||||
assert.ok(content.includes("S01"), "should reference S01");
|
||||
assert.ok(content.includes("S02"), "should reference S02");
|
||||
assert.ok(content.includes("feature-s01"), "should aggregate provides");
|
||||
assert.ok(content.includes("feature-s02"), "should aggregate provides");
|
||||
|
||||
const summaryPath = join(base, ".gsd", "milestones", mid, `${mid}-SUMMARY.md`);
|
||||
assert.ok(existsSync(summaryPath), "M##-SUMMARY.md should exist");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("mechanical: decision deduplication skips existing decisions", async () => {
|
||||
const base = createTmpBase();
|
||||
try {
|
||||
const gsdRoot = join(base, ".gsd");
|
||||
mkdirSync(gsdRoot, { recursive: true });
|
||||
|
||||
// Write existing decisions
|
||||
const decisionsPath = join(gsdRoot, "DECISIONS.md");
|
||||
writeFileSync(decisionsPath, "# Decisions\n\n- Used TypeScript for type safety\n", "utf-8");
|
||||
|
||||
const { appendNewDecisions } = await import("../mechanical-completion.js");
|
||||
|
||||
// Call with one existing and one new decision
|
||||
const mockSummaries = [
|
||||
{
|
||||
frontmatter: {
|
||||
id: "T01", parent: "S01", milestone: "M001",
|
||||
provides: [], requires: [], affects: [],
|
||||
key_files: [], key_decisions: ["Used TypeScript for type safety", "Chose Express over Koa"],
|
||||
patterns_established: [], drill_down_paths: [], observability_surfaces: [],
|
||||
duration: "", verification_result: "passed", completed_at: "", blocker_discovered: false,
|
||||
},
|
||||
title: "T01", oneLiner: "", whatHappened: "", deviations: "", filesModified: [],
|
||||
},
|
||||
];
|
||||
|
||||
await appendNewDecisions(base, mockSummaries as any);
|
||||
|
||||
const updated = readFileSync(decisionsPath, "utf-8");
|
||||
assert.ok(updated.includes("Chose Express over Koa"), "should append new decision");
|
||||
// The existing decision should not be duplicated
|
||||
const matches = updated.match(/Used TypeScript for type safety/g);
|
||||
assert.equal(matches?.length, 1, "should not duplicate existing decision");
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
|
@ -25,6 +25,7 @@ const BASE_VARS = {
|
|||
outputPath: "/tmp/test-project/.gsd/milestones/M001/slices/S01/S01-PLAN.md",
|
||||
inlinedContext: "--- test inlined context ---",
|
||||
dependencySummaries: "", executorContextConstraints: "",
|
||||
sourceFilePaths: "- **Requirements**: `.gsd/REQUIREMENTS.md`",
|
||||
};
|
||||
|
||||
test("plan-slice prompt: commit step present when commit_docs=true", () => {
|
||||
|
|
|
|||
|
|
@ -53,6 +53,7 @@ test("types: PhaseSkipPreferences interface exported", () => {
|
|||
assert.ok(typesSrc.includes("skip_research"), "should include skip_research");
|
||||
assert.ok(typesSrc.includes("skip_reassess"), "should include skip_reassess");
|
||||
assert.ok(typesSrc.includes("skip_slice_research"), "should include skip_slice_research");
|
||||
assert.ok(typesSrc.includes("reassess_after_slice"), "should include reassess_after_slice");
|
||||
});
|
||||
|
||||
// ═══════════════════════════════════════════════════════════════════════════
|
||||
|
|
@ -113,24 +114,21 @@ test("profile: budget profile sets phase skips to true", () => {
|
|||
assert.ok(budgetBlock.includes("skip_slice_research: true"), "budget should skip slice research");
|
||||
});
|
||||
|
||||
test("profile: balanced profile skips only slice research", () => {
|
||||
test("profile: balanced profile skips research, reassess, and slice research (ADR-003)", () => {
|
||||
const balancedIdx = preferencesSrc.indexOf('case "balanced":');
|
||||
const qualityIdx = preferencesSrc.indexOf('case "quality":');
|
||||
const balancedBlock = preferencesSrc.slice(balancedIdx, qualityIdx);
|
||||
assert.ok(balancedBlock.includes("skip_slice_research: true"), "balanced should skip slice research");
|
||||
assert.ok(!balancedBlock.includes("skip_research: true"), "balanced should NOT skip milestone research");
|
||||
assert.ok(!balancedBlock.includes("skip_reassess: true"), "balanced should NOT skip reassess");
|
||||
assert.ok(balancedBlock.includes("skip_research: true"), "balanced should skip milestone research");
|
||||
assert.ok(balancedBlock.includes("skip_reassess: true"), "balanced should skip reassess");
|
||||
});
|
||||
|
||||
test("profile: quality profile has empty phases (no skips)", () => {
|
||||
test("profile: quality profile skips research, slice research, and reassess (ADR-003)", () => {
|
||||
const qualityIdx = preferencesSrc.indexOf('case "quality":');
|
||||
const qualityEnd = preferencesSrc.indexOf("}", qualityIdx + 50);
|
||||
// Look for the return block after case "quality":
|
||||
const qualityReturn = preferencesSrc.slice(qualityIdx, qualityIdx + 200);
|
||||
assert.ok(
|
||||
qualityReturn.includes("phases: {}"),
|
||||
"quality should have empty phases object (no skips)",
|
||||
);
|
||||
const qualityBlock = preferencesSrc.slice(qualityIdx, qualityIdx + 300);
|
||||
assert.ok(qualityBlock.includes("skip_research: true"), "quality should skip research");
|
||||
assert.ok(qualityBlock.includes("skip_slice_research: true"), "quality should skip slice research");
|
||||
assert.ok(qualityBlock.includes("skip_reassess: true"), "quality should skip reassess");
|
||||
});
|
||||
|
||||
// ═══════════════════════════════════════════════════════════════════════════
|
||||
|
|
@ -253,10 +251,10 @@ test("dispatch: research-slice rule has skip guards", () => {
|
|||
);
|
||||
});
|
||||
|
||||
test("dispatch: reassess-roadmap rule has skip_reassess guard", () => {
|
||||
test("dispatch: reassess-roadmap rule has reassess_after_slice opt-in guard (ADR-003)", () => {
|
||||
assert.ok(
|
||||
dispatchSrc.includes("skip_reassess") && dispatchSrc.includes("reassess-roadmap"),
|
||||
"reassess-roadmap dispatch rule should check phases.skip_reassess",
|
||||
dispatchSrc.includes("reassess_after_slice") && dispatchSrc.includes("reassess-roadmap"),
|
||||
"reassess-roadmap dispatch rule should check phases.reassess_after_slice",
|
||||
);
|
||||
});
|
||||
|
||||
|
|
@ -265,6 +263,6 @@ test("dispatch: phase skip guards return null (not stop)", () => {
|
|||
const researchGuard = dispatchSrc.match(/skip_research\).*?return null/s);
|
||||
assert.ok(researchGuard, "skip_research guard should return null (fall-through)");
|
||||
|
||||
const reassessGuard = dispatchSrc.match(/skip_reassess\).*?return null/s);
|
||||
assert.ok(reassessGuard, "skip_reassess guard should return null (fall-through)");
|
||||
const reassessGuard = dispatchSrc.match(/reassess_after_slice\).*?return null/s);
|
||||
assert.ok(reassessGuard, "reassess_after_slice guard should return null (fall-through)");
|
||||
});
|
||||
|
|
|
|||
|
|
@ -304,6 +304,8 @@ export interface PhaseSkipPreferences {
|
|||
skip_reassess?: boolean;
|
||||
skip_slice_research?: boolean;
|
||||
skip_milestone_validation?: boolean;
|
||||
/** When true, reassess-roadmap fires after each slice completion. Opt-in. */
|
||||
reassess_after_slice?: boolean;
|
||||
/** When true, auto-mode pauses before each slice for discussion (#789). */
|
||||
require_slice_discussion?: boolean;
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue