docs: parallel milestone orchestration documentation
New: docs/parallel-orchestration.md — comprehensive guide covering: - Architecture overview with coordinator/worker diagram - Worker isolation model (process, worktree, state, context) - Eligibility analysis (dependencies + file overlap) - Configuration reference (all parallel.* keys) - Commands reference (/gsd parallel start|status|stop|pause|resume|merge) - Signal lifecycle (coordinator → worker communication) - Merge reconciliation with conflict handling - Budget management across workers - Doctor integration and health monitoring - Safety model breakdown - File layout (.gsd/parallel/ and .gsd/worktrees/) - Troubleshooting guide Updated existing docs: - auto-mode.md: cross-reference to parallel orchestration - configuration.md: parallel config block with all keys - commands.md: parallel commands table - git-strategy.md: parallel worktree branching diagram - README.md: added to documentation index
This commit is contained in:
parent
9232ad6a2b
commit
01b0d530c8
6 changed files with 360 additions and 0 deletions
|
|
@ -17,6 +17,7 @@ Welcome to the GSD documentation. This covers everything from getting started to
|
|||
| [Workflow Visualizer](./visualizer.md) | Interactive TUI overlay for progress, dependencies, metrics, and timeline (v2.19) |
|
||||
| [Cost Management](./cost-management.md) | Budget ceilings, cost tracking, projections, and enforcement modes |
|
||||
| [Git Strategy](./git-strategy.md) | Worktree isolation, branching model, and merge behavior |
|
||||
| [Parallel Orchestration](./parallel-orchestration.md) | Run multiple milestones simultaneously with worker isolation and coordination |
|
||||
| [Working in Teams](./working-in-teams.md) | Unique milestone IDs, `.gitignore` setup, and shared planning artifacts |
|
||||
| [Skills](./skills.md) | Bundled skills, skill discovery, and custom skill authoring |
|
||||
| [Migration from v1](./migration.md) | Migrating `.planning` directories from the original GSD |
|
||||
|
|
|
|||
|
|
@ -51,6 +51,10 @@ GSD isolates milestone work using one of three modes (configured via `git.isolat
|
|||
|
||||
See [Git Strategy](./git-strategy.md) for details.
|
||||
|
||||
### Parallel Execution
|
||||
|
||||
When your project has independent milestones, you can run them simultaneously. Each milestone gets its own worker process and worktree. See [Parallel Orchestration](./parallel-orchestration.md) for setup and usage.
|
||||
|
||||
### Crash Recovery
|
||||
|
||||
A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
|
||||
|
|
|
|||
|
|
@ -34,6 +34,19 @@
|
|||
| `/gsd run-hook` | Manually trigger a specific hook |
|
||||
| `/gsd migrate` | Migrate a v1 `.planning` directory to `.gsd` format |
|
||||
|
||||
## Parallel Orchestration
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
|
||||
| `/gsd parallel status` | Show all workers with state, progress, and cost |
|
||||
| `/gsd parallel stop [MID]` | Stop all workers or a specific milestone's worker |
|
||||
| `/gsd parallel pause [MID]` | Pause all workers or a specific one |
|
||||
| `/gsd parallel resume [MID]` | Resume paused workers |
|
||||
| `/gsd parallel merge [MID]` | Merge completed milestones back to main |
|
||||
|
||||
See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
|
||||
|
||||
## Git Commands
|
||||
|
||||
| Command | Description |
|
||||
|
|
|
|||
|
|
@ -389,6 +389,21 @@ auto_visualize: true
|
|||
|
||||
See [Workflow Visualizer](./visualizer.md).
|
||||
|
||||
### `parallel`
|
||||
|
||||
Run multiple milestones simultaneously. Disabled by default.
|
||||
|
||||
```yaml
|
||||
parallel:
|
||||
enabled: false # Master toggle
|
||||
max_workers: 2 # Concurrent workers (1-4)
|
||||
budget_ceiling: 50.00 # Aggregate cost limit in USD
|
||||
merge_strategy: "per-milestone" # "per-slice" or "per-milestone"
|
||||
auto_merge: "confirm" # "auto", "confirm", or "manual"
|
||||
```
|
||||
|
||||
See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
|
||||
|
||||
## Full Example
|
||||
|
||||
```yaml
|
||||
|
|
|
|||
|
|
@ -48,6 +48,26 @@ In **branch mode**, the flow is the same except work happens in the project root
|
|||
|
||||
In **none mode**, commits land directly on the current branch — no milestone branch is created, and no merge step is needed.
|
||||
|
||||
### Parallel Worktrees
|
||||
|
||||
With [parallel orchestration](./parallel-orchestration.md) enabled, multiple milestones run in separate worktrees simultaneously:
|
||||
|
||||
```
|
||||
main ──────────────────────────────────────────────────────────
|
||||
│ ↑ ↑
|
||||
├── milestone/M002 (worktree) ─────────┘ │
|
||||
│ commit: feat(S01/T01): auth types │
|
||||
│ commit: feat(S01/T02): JWT middleware │
|
||||
│ → squash-merged first │
|
||||
│ │
|
||||
└── milestone/M003 (worktree) ────────────────────────┘
|
||||
commit: feat(S01/T01): dashboard layout
|
||||
commit: feat(S01/T02): chart components
|
||||
→ squash-merged second
|
||||
```
|
||||
|
||||
Each worktree operates on its own branch with its own commit history. Merges happen sequentially to avoid conflicts.
|
||||
|
||||
### Key Properties
|
||||
|
||||
- **Sequential commits on one branch** — no per-slice branches, no merge conflicts within a milestone
|
||||
|
|
|
|||
307
docs/parallel-orchestration.md
Normal file
307
docs/parallel-orchestration.md
Normal file
|
|
@ -0,0 +1,307 @@
|
|||
# Parallel Milestone Orchestration
|
||||
|
||||
Run multiple milestones simultaneously in isolated git worktrees. Each milestone gets its own worker process, its own branch, and its own context window — while a coordinator tracks progress, enforces budgets, and keeps everything in sync.
|
||||
|
||||
> **Status:** Behind `parallel.enabled: false` by default. Opt-in only — zero impact to existing users.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Enable parallel mode in your preferences:
|
||||
|
||||
```yaml
|
||||
---
|
||||
parallel:
|
||||
enabled: true
|
||||
max_workers: 2
|
||||
---
|
||||
```
|
||||
|
||||
2. Start parallel execution:
|
||||
|
||||
```
|
||||
/gsd parallel start
|
||||
```
|
||||
|
||||
GSD scans your milestones, checks dependencies and file overlap, shows an eligibility report, and spawns workers for eligible milestones.
|
||||
|
||||
3. Monitor progress:
|
||||
|
||||
```
|
||||
/gsd parallel status
|
||||
```
|
||||
|
||||
4. Stop when done:
|
||||
|
||||
```
|
||||
/gsd parallel stop
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Coordinator (your GSD session) │
|
||||
│ │
|
||||
│ Responsibilities: │
|
||||
│ - Eligibility analysis (deps + file overlap) │
|
||||
│ - Worker spawning and lifecycle │
|
||||
│ - Budget tracking across all workers │
|
||||
│ - Signal dispatch (pause/resume/stop) │
|
||||
│ - Session status monitoring │
|
||||
│ - Merge reconciliation │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ ... │
|
||||
│ │ M001 │ │ M003 │ │ M005 │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ .gsd/worktrees/ .gsd/worktrees/ .gsd/worktrees/ │
|
||||
│ M001/ M003/ M005/ │
|
||||
│ (milestone/ (milestone/ (milestone/ │
|
||||
│ M001 branch) M003 branch) M005 branch) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Worker Isolation
|
||||
|
||||
Each worker is a separate `gsd` process with complete isolation:
|
||||
|
||||
| Resource | Isolation Method |
|
||||
|----------|-----------------|
|
||||
| **Filesystem** | Git worktree — each worker has its own checkout |
|
||||
| **Git branch** | `milestone/<MID>` — one branch per milestone |
|
||||
| **State derivation** | `GSD_MILESTONE_LOCK` env var — `deriveState()` only sees the assigned milestone |
|
||||
| **Context window** | Separate process — each worker has its own agent sessions |
|
||||
| **Metrics** | Each worktree has its own `.gsd/metrics.json` |
|
||||
| **Crash recovery** | Each worktree has its own `.gsd/auto.lock` |
|
||||
|
||||
### Coordination
|
||||
|
||||
Workers and the coordinator communicate through file-based IPC:
|
||||
|
||||
- **Session status files** (`.gsd/parallel/<MID>.status.json`) — workers write heartbeats, the coordinator reads them
|
||||
- **Signal files** (`.gsd/parallel/<MID>.signal.json`) — coordinator writes signals, workers consume them
|
||||
- **Atomic writes** — write-to-temp + rename prevents partial reads
|
||||
|
||||
## Eligibility Analysis
|
||||
|
||||
Before starting parallel execution, GSD checks which milestones can safely run concurrently.
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Not complete** — Finished milestones are skipped
|
||||
2. **Dependencies satisfied** — All `dependsOn` entries must have status `complete`
|
||||
3. **File overlap check** — Milestones touching the same files get a warning (but are still eligible)
|
||||
|
||||
### Example Report
|
||||
|
||||
```
|
||||
# Parallel Eligibility Report
|
||||
|
||||
## Eligible for Parallel Execution (2)
|
||||
|
||||
- **M002** — Auth System
|
||||
All dependencies satisfied.
|
||||
- **M003** — Dashboard UI
|
||||
All dependencies satisfied.
|
||||
|
||||
## Ineligible (2)
|
||||
|
||||
- **M001** — Core Types
|
||||
Already complete.
|
||||
- **M004** — API Integration
|
||||
Blocked by incomplete dependencies: M002.
|
||||
|
||||
## File Overlap Warnings (1)
|
||||
|
||||
- **M002** <-> **M003** — 2 shared file(s):
|
||||
- `src/types.ts`
|
||||
- `src/middleware.ts`
|
||||
```
|
||||
|
||||
File overlaps are warnings, not blockers. Both milestones work in separate worktrees, so they won't interfere at the filesystem level. Conflicts are detected and resolved during merge.
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to `~/.gsd/preferences.md` or `.gsd/preferences.md`:
|
||||
|
||||
```yaml
|
||||
---
|
||||
parallel:
|
||||
enabled: false # Master toggle (default: false)
|
||||
max_workers: 2 # Concurrent workers (1-4, default: 2)
|
||||
budget_ceiling: 50.00 # Aggregate cost limit in dollars (optional)
|
||||
merge_strategy: "per-milestone" # When to merge: "per-slice" or "per-milestone"
|
||||
auto_merge: "confirm" # "auto", "confirm", or "manual"
|
||||
---
|
||||
```
|
||||
|
||||
### Configuration Reference
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
|-----|------|---------|-------------|
|
||||
| `enabled` | boolean | `false` | Master toggle. Must be `true` for `/gsd parallel` commands to work. |
|
||||
| `max_workers` | number (1-4) | `2` | Maximum concurrent worker processes. Higher values use more memory and API budget. |
|
||||
| `budget_ceiling` | number | none | Aggregate cost ceiling in USD across all workers. When reached, no new units are dispatched. |
|
||||
| `merge_strategy` | `"per-slice"` or `"per-milestone"` | `"per-milestone"` | When worktree changes merge back to main. Per-milestone waits for the full milestone to complete. |
|
||||
| `auto_merge` | `"auto"`, `"confirm"`, `"manual"` | `"confirm"` | How merge-back is handled. `confirm` prompts before merging. `manual` requires explicit `/gsd parallel merge`. |
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
|
||||
| `/gsd parallel status` | Show all workers with state, units completed, and cost |
|
||||
| `/gsd parallel stop` | Stop all workers (sends SIGTERM) |
|
||||
| `/gsd parallel stop M002` | Stop a specific milestone's worker |
|
||||
| `/gsd parallel pause` | Pause all workers (finish current unit, then wait) |
|
||||
| `/gsd parallel pause M002` | Pause a specific worker |
|
||||
| `/gsd parallel resume` | Resume all paused workers |
|
||||
| `/gsd parallel resume M002` | Resume a specific worker |
|
||||
| `/gsd parallel merge` | Merge all completed milestones back to main |
|
||||
| `/gsd parallel merge M002` | Merge a specific milestone back to main |
|
||||
|
||||
## Signal Lifecycle
|
||||
|
||||
The coordinator communicates with workers through signals:
|
||||
|
||||
```
|
||||
Coordinator Worker
|
||||
│ │
|
||||
├── sendSignal("pause") ──→ │
|
||||
│ ├── consumeSignal()
|
||||
│ ├── pauseAuto()
|
||||
│ │ (finish current unit, wait)
|
||||
│ │
|
||||
├── sendSignal("resume") ─→ │
|
||||
│ ├── consumeSignal()
|
||||
│ ├── resume dispatch loop
|
||||
│ │
|
||||
├── sendSignal("stop") ───→ │
|
||||
│ + SIGTERM ────────────→ │
|
||||
│ ├── consumeSignal() or SIGTERM handler
|
||||
│ ├── stopAuto()
|
||||
│ └── process exits
|
||||
```
|
||||
|
||||
Workers check for signals between units (in `handleAgentEnd`). The coordinator also sends `SIGTERM` for immediate response on stop.
|
||||
|
||||
## Merge Reconciliation
|
||||
|
||||
When milestones complete, their worktree changes need to merge back to main.
|
||||
|
||||
### Merge Order
|
||||
|
||||
- **Sequential** (default): Milestones merge in ID order (M001 before M002)
|
||||
- **By-completion**: Milestones merge in the order they finish
|
||||
|
||||
### Conflict Handling
|
||||
|
||||
1. `.gsd/` state files (STATE.md, metrics.json, etc.) — **auto-resolved** by accepting the milestone branch version
|
||||
2. Code conflicts — **stop and report**. The merge halts, showing which files conflict. Resolve manually and retry with `/gsd parallel merge <MID>`.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
/gsd parallel merge
|
||||
|
||||
# Merge Results
|
||||
|
||||
- **M002** — merged successfully (pushed)
|
||||
- **M003** — CONFLICT (2 file(s)):
|
||||
- `src/types.ts`
|
||||
- `src/middleware.ts`
|
||||
Resolve conflicts manually and run `/gsd parallel merge M003` to retry.
|
||||
```
|
||||
|
||||
## Budget Management
|
||||
|
||||
When `budget_ceiling` is set, the coordinator tracks aggregate cost across all workers:
|
||||
|
||||
- Cost is summed from each worker's session status
|
||||
- When the ceiling is reached, the coordinator signals workers to stop
|
||||
- Each worker also respects the project-level `budget_ceiling` preference independently
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Doctor Integration
|
||||
|
||||
`/gsd doctor` detects parallel session issues:
|
||||
|
||||
- **Stale parallel sessions** — Worker process died without cleanup. Doctor finds `.gsd/parallel/*.status.json` files with dead PIDs or expired heartbeats and removes them.
|
||||
|
||||
Run `/gsd doctor --fix` to clean up automatically.
|
||||
|
||||
### Stale Detection
|
||||
|
||||
Sessions are considered stale when:
|
||||
- The worker PID is no longer running (checked via `process.kill(pid, 0)`)
|
||||
- The last heartbeat is older than 30 seconds
|
||||
|
||||
The coordinator runs stale detection during `refreshWorkerStatuses()` and automatically removes dead sessions.
|
||||
|
||||
## Safety Model
|
||||
|
||||
| Safety Layer | Protection |
|
||||
|-------------|------------|
|
||||
| **Feature flag** | `parallel.enabled: false` by default — existing users unaffected |
|
||||
| **Eligibility analysis** | Dependency and file overlap checks before starting |
|
||||
| **Worker isolation** | Separate processes, worktrees, branches, context windows |
|
||||
| **`GSD_MILESTONE_LOCK`** | Each worker only sees its milestone in state derivation |
|
||||
| **`GSD_PARALLEL_WORKER`** | Workers cannot spawn nested parallel sessions |
|
||||
| **Budget ceiling** | Aggregate cost enforcement across all workers |
|
||||
| **Signal-based shutdown** | Graceful stop via file signals + SIGTERM |
|
||||
| **Doctor integration** | Detects and cleans up orphaned sessions |
|
||||
| **Conflict-aware merge** | Stops on code conflicts, auto-resolves `.gsd/` state conflicts |
|
||||
|
||||
## File Layout
|
||||
|
||||
```
|
||||
.gsd/
|
||||
├── parallel/ # Coordinator ↔ worker IPC
|
||||
│ ├── M002.status.json # Worker heartbeat + progress
|
||||
│ ├── M002.signal.json # Coordinator → worker signals
|
||||
│ ├── M003.status.json
|
||||
│ └── M003.signal.json
|
||||
├── worktrees/ # Git worktrees (one per milestone)
|
||||
│ ├── M002/ # M002's isolated checkout
|
||||
│ │ ├── .gsd/ # M002's own state files
|
||||
│ │ │ ├── auto.lock
|
||||
│ │ │ ├── metrics.json
|
||||
│ │ │ └── milestones/
|
||||
│ │ └── src/ # M002's working copy
|
||||
│ └── M003/
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
Both `.gsd/parallel/` and `.gsd/worktrees/` are gitignored — they're runtime-only coordination files that never get committed.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Parallel mode is not enabled"
|
||||
|
||||
Set `parallel.enabled: true` in your preferences file.
|
||||
|
||||
### "No milestones are eligible for parallel execution"
|
||||
|
||||
All milestones are either complete or blocked by dependencies. Check `/gsd queue` to see milestone status and dependency chains.
|
||||
|
||||
### Worker crashed — how to recover
|
||||
|
||||
1. Run `/gsd doctor --fix` to clean up stale sessions
|
||||
2. Run `/gsd parallel status` to see current state
|
||||
3. Re-run `/gsd parallel start` to spawn new workers for remaining milestones
|
||||
|
||||
### Merge conflicts after parallel completion
|
||||
|
||||
1. Run `/gsd parallel merge` to see which milestones have conflicts
|
||||
2. Resolve conflicts in the worktree at `.gsd/worktrees/<MID>/`
|
||||
3. Retry with `/gsd parallel merge <MID>`
|
||||
|
||||
### Workers seem stuck
|
||||
|
||||
Check if budget ceiling was reached: `/gsd parallel status` shows per-worker costs. Increase `parallel.budget_ceiling` or remove it to continue.
|
||||
Loading…
Add table
Reference in a new issue