317 lines
21 KiB
Markdown
317 lines
21 KiB
Markdown
# ADR-001: Branchless Worktree Architecture
|
||
|
||
> Historical note: this ADR predates the current DB-backed planning-state direction.
|
||
> Where it says markdown is truth and DB is cache, that statement is superseded by
|
||
> `docs/adr/0000-purpose-to-software-compiler.md` and
|
||
> `docs/adr/0001-promote-only-sf-state.md`: structured `.sf` state is authoritative
|
||
> at runtime, and markdown is a projection when structured state exists.
|
||
|
||
**Status:** Accepted — partial drift
|
||
**Date:** 2026-03-15
|
||
**Revised:** 2026-05-02 — partial drift documented; code migration incomplete
|
||
**Deciders:** Lex Christopherson
|
||
**Advisors:** Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)
|
||
|
||
## Context
|
||
|
||
SF uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a **worktree per milestone** with **slice branches inside each worktree**. Each slice (`S01`, `S02`, ...) gets its own branch (`sf/M001/S01`) within the worktree, which merges back to the milestone branch (`milestone/M001`) via `--no-ff` when the slice completes. The milestone branch squash-merges to `main` when the milestone completes.
|
||
|
||
This architecture replaced a previous "branch-per-slice" model that had severe `.sf/` merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.
|
||
|
||
### Problems
|
||
|
||
**1. Planning artifact invisibility (loop detection failures)**
|
||
|
||
When `research-slice` or `plan-slice` dispatches, the agent writes artifacts (e.g., `S02-RESEARCH.md`) on a slice branch. After the agent completes, `handleAgentEnd` switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. `verifyExpectedArtifact()` checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.
|
||
|
||
Documented in the auto-stop architecture doc as "The Branch-Switching Problem."
|
||
|
||
**2. `.sf/` state clobbering across branches**
|
||
|
||
`.sf/` is gitignored (line 52 of `.gitignore`: `.sf/`). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in `.sf/milestones/` but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single `.sf/` directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. SF reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.
|
||
|
||
The codebase has a contradictory workaround: `smartStage()` (git-service.ts:304-352) force-adds `SF_DURABLE_PATHS` (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the `.gitignore`. This means `.sf/milestones/` IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.
|
||
|
||
**3. Merge/conflict code complexity**
|
||
|
||
The current slice branch model requires:
|
||
- `mergeSliceToMilestone()` — 98 lines, `--no-ff` merge with `withMergeHeal` wrapper
|
||
- `mergeSliceToMain()` — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
|
||
- `git-self-heal.ts` — 198 lines, 3 recovery functions for merge failures
|
||
- `fix-merge` dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
|
||
- `smartStage()` — 49 lines of runtime exclusion during staging
|
||
- Conflict categorization — 80 lines classifying `.sf/` vs runtime vs code conflicts
|
||
|
||
Total: **~582 lines** of merge/branch/conflict code across 3 files, plus the `fix-merge` prompt template and dispatch logic. This code exists solely because of slice branches.
|
||
|
||
**4. Dual isolation modes**
|
||
|
||
Branch-mode (`git-service.ts:mergeSliceToMain`) and worktree-mode (`auto-worktree.ts:mergeSliceToMilestone`) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.
|
||
|
||
**5. Bug history**
|
||
|
||
- v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
|
||
- v2.13.1: Windows hotfix for multi-line commit messages in `mergeSliceToMilestone`
|
||
- 15+ separate bug fixes for `.sf/` merge conflicts in the pre-M003 era
|
||
- Persistent user complaints about loop detection failures and state corruption
|
||
|
||
## Decision
|
||
|
||
**Eliminate slice branches entirely.** All work within a milestone worktree commits sequentially on a single branch (`milestone/<MID>`). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.
|
||
|
||
Track `.sf/` planning artifacts in git. Gitignore only runtime/ephemeral state.
|
||
|
||
### The Architecture
|
||
|
||
```
|
||
main ──────────────────────────────────────────── main
|
||
│ ↑
|
||
└─ worktree (milestone/M001) │
|
||
│ │
|
||
commit: feat(M001): context + roadmap │
|
||
commit: feat(M001/S01): research │
|
||
commit: feat(M001/S01): plan │
|
||
commit: feat(M001/S01/T01): impl │
|
||
commit: feat(M001/S01/T02): impl │
|
||
commit: feat(M001/S01): summary + UAT │
|
||
commit: feat(M001/S02): research │
|
||
commit: ... │
|
||
commit: feat(M001): milestone complete │
|
||
│ │
|
||
└──────────── squash merge ──────────────────┘
|
||
```
|
||
|
||
### Git Primitives Used
|
||
|
||
| Primitive | Purpose |
|
||
|-----------|---------|
|
||
| **Worktrees** | One per active milestone. Filesystem isolation. |
|
||
| **Commits** | Granular sequential history of every action. |
|
||
| **Squash merge** | Clean single commit on `main` per milestone. |
|
||
| **Branches** | Only `main` and `milestone/<MID>`. Nothing else. |
|
||
|
||
### Git Primitives NOT Used
|
||
|
||
| Primitive | Why Not |
|
||
|-----------|---------|
|
||
| Slice branches | Slices are sequential. Branches add complexity with no rollback benefit. |
|
||
| `--no-ff` merges | No branches to merge within a worktree. |
|
||
| Branch switching | Never happens. All work on one branch. |
|
||
| Conflict resolution | No merges within a worktree means no conflicts within a worktree. |
|
||
|
||
### `.sf/` Tracking Model
|
||
|
||
**Tracked in git (travels with the branch):**
|
||
```
|
||
.sf/milestones/ — roadmaps, plans, summaries, research, contexts, task plans/summaries
|
||
.sf/PROJECT.md — project overview
|
||
.sf/DECISIONS.md — architectural decision register
|
||
.sf/REQUIREMENTS.md — requirements register
|
||
.sf/QUEUE.md — work queue
|
||
```
|
||
|
||
**Gitignored (ephemeral, runtime, infrastructure):**
|
||
```
|
||
.sf/runtime/ — dispatch records, timeout tracking
|
||
.sf/activity/ — JSONL session dumps
|
||
.sf/worktrees/ — git worktree working directories
|
||
.sf/auto.lock — crash detection sentinel
|
||
.sf/metrics.json — token/cost accumulator
|
||
.sf/completed-units.json — dispatch idempotency tracker
|
||
.sf/STATE.md — derived state cache (rebuilt by deriveState())
|
||
.sf/sf.db — SQLite cache (rebuilt from tracked markdown by importers)
|
||
.sf/DISCUSSION-MANIFEST.json — discussion phase tracking
|
||
.sf/milestones/**/*-CONTINUE.md — interrupted-work markers
|
||
.sf/milestones/**/continue.md — legacy continue markers
|
||
```
|
||
|
||
### `.gitignore` Update
|
||
|
||
Replace the current blanket `.sf/` ignore with explicit runtime-only ignores:
|
||
|
||
```gitignore
|
||
# ── SF: Runtime / Ephemeral ─────────────────────────────────
|
||
.sf/auto.lock
|
||
.sf/completed-units.json
|
||
.sf/STATE.md
|
||
.sf/metrics.json
|
||
.sf/sf.db
|
||
.sf/activity/
|
||
.sf/runtime/
|
||
.sf/worktrees/
|
||
.sf/DISCUSSION-MANIFEST.json
|
||
.sf/milestones/**/*-CONTINUE.md
|
||
.sf/milestones/**/continue.md
|
||
```
|
||
|
||
Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in `.gitignore` and are tracked normally.
|
||
|
||
## Consequences
|
||
|
||
### Code Deletion
|
||
|
||
| File | Lines Deleted | What's Removed |
|
||
|------|--------------|----------------|
|
||
| `auto-worktree.ts` | ~246 | `mergeSliceToMilestone()`, `shouldUseWorktreeIsolation()`, `getMergeToMainMode()`, slice merge guards |
|
||
| `git-service.ts` | ~250 | `mergeSliceToMain()`, conflict resolution, runtime stripping post-merge, `ensureSliceBranch()`, `switchToMain()` |
|
||
| `git-self-heal.ts` | ~86 | `abortAndReset()`, `withMergeHeal()` (merge-specific recovery) **(still present — see Drift section)** |
|
||
| `auto.ts` | ~150 | Merge dispatch guards, `fix-merge` dispatch path, branch-mode routing **(partially still present — see Drift section)** |
|
||
| `worktree.ts` | ~40 | `getSliceBranchName()`, `ensureSliceBranch()`, `mergeSliceToMain()` delegates **(getSliceBranchName still present — see Drift section)** |
|
||
| **Test files** | ~11 files | `auto-worktree-merge.test.ts`, `auto-worktree-milestone-merge.test.ts`, merge-related test cases |
|
||
| **Total** | **~770+ lines** | |
|
||
|
||
*Verified 2026-05-02: mergeSliceToMilestone and the original mergeSliceToMain (git-service.ts) are deleted. Several other items listed above are still present — see Drift section below. Current authoritative milestone merge path: `src/resources/extensions/sf/auto-worktree.ts:1616` (`mergeMilestoneToMain`). A separate newer function `mergeSliceToMain` at `src/resources/extensions/sf/slice-cadence.ts:92` was added post-ADR for the slice-cadence collapse feature (#4765) and is unrelated to the deleted branch-era function.*
|
||
|
||
### What `mergeMilestoneToMain()` Becomes
|
||
|
||
The function simplifies dramatically:
|
||
1. Auto-commit any dirty state in worktree
|
||
2. `chdir` back to main repo root
|
||
3. `git checkout main`
|
||
4. `git merge --squash milestone/<MID>`
|
||
5. `git commit` with milestone summary
|
||
6. Remove worktree + delete branch
|
||
|
||
No conflict categorization. No runtime file stripping. No `.sf/` special handling. Planning artifacts merge cleanly because they're in `.sf/milestones/M001/` which doesn't exist on `main` until this merge.
|
||
|
||
### What `smartStage()` Becomes
|
||
|
||
The force-add of `SF_DURABLE_PATHS` is no longer needed — planning artifacts are not gitignored, so `git add -A` picks them up naturally. The function reduces to:
|
||
|
||
1. `git add -A`
|
||
2. `git reset HEAD -- <runtime paths>` (unstage runtime files)
|
||
|
||
The `_runtimeFilesCleanedUp` one-time migration logic can also be removed.
|
||
|
||
*Verified 2026-05-02: `smartStage()` at `src/resources/extensions/sf/git-service.ts:576` is still present in its original form, including `_runtimeFilesCleanedUp` migration logic and `SF_MILESTONE_LOCK` parallel-scope logic. The simplification described here has not been performed — see Drift section.*
|
||
|
||
### What Happens to `handleAgentEnd()`
|
||
|
||
After any unit completes:
|
||
1. Invalidate caches
|
||
2. `autoCommitCurrentBranch()` — commits on the one and only branch
|
||
3. `verifyExpectedArtifact()` — file is always on the current branch (no branch switching)
|
||
4. Persist completion key
|
||
|
||
The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.
|
||
|
||
### What Happens to `fix-merge`
|
||
|
||
The `fix-merge` dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.
|
||
|
||
*Verified 2026-05-02: The `fix-merge` prompt template has been deleted (no `fix-merge.md` exists in `src/resources/extensions/sf/prompts/`). However, `MergeConflictError` (re-exported from `git-service.ts:201`) is still present along with a JSDoc comment at `git-service.ts:199` that still mentions dispatching a "fix-merge session". No active dispatch-loop code was found — this appears to be residual documentation in the class definition, not active dispatch logic. Treated as "still present (partial)" in the Drift section.*
|
||
|
||
### Backwards Compatibility
|
||
|
||
The `shouldUseWorktreeIsolation()` three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The `git.isolation: "branch"` preference is deprecated.
|
||
|
||
Projects with existing `sf/M001/S01` slice branches can still be read by state derivation, but new work never creates slice branches.
|
||
|
||
### Risks
|
||
|
||
**1. Parallel milestone code conflicts at squash-merge time**
|
||
|
||
If two milestones modify the same source file, the second squash-merge to `main` will conflict. Mitigation: `git fetch origin main && git rebase main` before squash-merge. This is standard practice and rare in single-user workflows.
|
||
|
||
**2. Loss of per-slice git history after squash**
|
||
|
||
Squash merge collapses all commits into one on `main`. Mitigations:
|
||
- Commit messages tag slices (`feat(M001/S01/T01):`) — filterable with `git log --grep`
|
||
- The milestone branch can be preserved (not deleted) if history is needed
|
||
- Alternative: `merge --no-ff` instead of `--squash` to keep history on `main`
|
||
|
||
**3. SQLite DB desync after `git reset`**
|
||
|
||
If tracked markdown rolls back via `git reset --hard`, the gitignored `sf.db` doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.
|
||
|
||
**4. Disk space with multiple worktrees**
|
||
|
||
Each worktree duplicates the working directory (including `node_modules`). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.
|
||
|
||
## Alternatives Considered
|
||
|
||
### A. Keep slice branches, fix visibility with immediate mini-merges
|
||
|
||
After `research-slice` or `plan-slice`, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.
|
||
|
||
**Rejected:** Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.
|
||
|
||
### B. Keep `.sf/` gitignored, bootstrap from git history for manual worktrees
|
||
|
||
When SF detects an empty `.sf/` in a worktree, reconstruct state from the branch's git history using `git show <commit>:.sf/...`.
|
||
|
||
**Rejected:** Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.
|
||
|
||
### C. Branch-scoped `.sf/` directories (`.sf/branches/<branch-name>/milestones/...`)
|
||
|
||
Each branch writes to a namespaced subdirectory within `.sf/`.
|
||
|
||
**Rejected:** Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (`git checkout` doesn't rename directories).
|
||
|
||
## Validation
|
||
|
||
This architecture was stress-tested by three independent models:
|
||
|
||
**Gemini 2.5 Pro** identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).
|
||
|
||
**GPT-5.4 (Codex)** read the full codebase and confirmed the model is sound. Identified that `smartStage()` already force-adds durable paths (validating the tracked-artifact approach) and that `resolveMainWorktreeRoot` in PR #487 is architecturally wrong (adopted — PR to be closed).
|
||
|
||
**Codebase analysis** confirmed `.sf/milestones/` is already partially tracked on `main` despite the `.gitignore`, that `SF_DURABLE_PATHS` exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.
|
||
|
||
### Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"
|
||
|
||
Codex read the full codebase and raised 4 concerns. Each is addressed:
|
||
|
||
**Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."**
|
||
|
||
Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, `deriveState()` reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.
|
||
|
||
**Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."**
|
||
|
||
Rebuttal: Valid edge case. If `/sf queue` edits `DECISIONS.md` on `main` while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.
|
||
|
||
**Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."**
|
||
|
||
Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is `main` diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against `main`. Pre-flight rebase before squash-merge catches this more directly.
|
||
|
||
**Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."**
|
||
|
||
Response: Accepted in spirit. Commits with conventional tags (`feat(M001/S01):`, `feat(M001/S01/T01):`) serve as the slice boundary primitive. `git log --grep="M001/S01"` isolates a slice's history. `git revert` targets specific commits. Git tags (`sf/M001/S01-complete`) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.
|
||
|
||
## Action Items
|
||
|
||
1. Close PR #487 (`resolveMainWorktreeRoot`) — contradicts this architecture
|
||
2. Implement as a SF milestone with phases:
|
||
- Update `.gitignore` and force-add existing planning artifacts
|
||
- Remove slice branch creation/switching/merging code
|
||
- Simplify `mergeMilestoneToMain()` and `smartStage()`
|
||
- Remove `fix-merge` dispatch unit
|
||
- Remove branch-mode isolation (`git.isolation: "branch"`)
|
||
- Update/delete 11 test files
|
||
- Update README suggested gitignore
|
||
- Migration path for existing projects with slice branches
|
||
|
||
## Drift From Original Decision
|
||
|
||
*Audited 2026-05-02. Items the ADR claims were deleted that are still present in the codebase:*
|
||
|
||
| Item | Status | File:Line | Cleanup Pending |
|
||
|------|--------|-----------|-----------------|
|
||
| `git-self-heal.ts` (whole file) | **Still present** | `src/resources/extensions/sf/git-self-heal.ts:1–142` | File is 142 lines; exports `abortAndReset()` and `formatGitError()`. The ADR claimed ~86 lines deleted. Delete entire file and migrate any callers of `abortAndReset()` to in-place reset logic. |
|
||
| `smartStage()` | **Still present** | `src/resources/extensions/sf/git-service.ts:576` | Still has `_runtimeFilesCleanedUp` migration logic, `SF_MILESTONE_LOCK` parallel-scope exclusions, and 49+ lines of runtime exclusion. Simplify as described in Consequences section. |
|
||
| `shouldUseWorktreeIsolation()` | **Still present** | `src/resources/extensions/sf/auto.ts:357` | ADR requires single-mode worktree-always behavior; this function still exists and defaults to `false` (worktree off unless explicit opt-in). Branch-mode fallback persists. Remove after deprecating `git.isolation: "branch"`. |
|
||
| `getSliceBranchName()` | **Still present** | `src/resources/extensions/sf/worktree.ts:261` | Still used by `workspace-index.ts:156` to record historical branch names. Evaluate whether this is still needed or can be removed. |
|
||
| `MergeConflictError` + fix-merge JSDoc | **Partially present** | `src/resources/extensions/sf/git-service.ts:199–225` | `MergeConflictError` class and a JSDoc comment referencing "dispatch a fix-merge session" remain. The `fix-merge.md` prompt template is deleted; no active dispatch loop found. Remove the JSDoc reference and evaluate if `MergeConflictError` is still needed (it is — used by `slice-cadence.ts`). |
|
||
|
||
### Items Confirmed Deleted
|
||
|
||
- `mergeSliceToMilestone()` — not found anywhere in the codebase.
|
||
- Original `mergeSliceToMain()` (branch-era, from `git-service.ts`) — deleted. A *new* `mergeSliceToMain()` exists at `src/resources/extensions/sf/slice-cadence.ts:92` but was added post-ADR for the slice-cadence collapse feature (#4765) and is architecturally consistent with the branchless model.
|
||
- `fix-merge.md` prompt template — deleted (no file at `src/resources/extensions/sf/prompts/fix-merge.md`).
|
||
- Conflict categorization (~80 lines) — not found.
|
||
- `withMergeHeal()` — not found.
|
||
- `ensureSliceBranch()` / `switchToMain()` / `getMergeToMainMode()` — not found.
|
||
|
||
### Current Authoritative Merge Path
|
||
|
||
The current milestone→main merge implementation is `mergeMilestoneToMain()` at `src/resources/extensions/sf/auto-worktree.ts:1616`. It performs squash-merge after auto-committing dirty worktree state, reconciling the worktree DB, and running a pre-flight rebase. It does not use slice branches, `withMergeHeal`, or conflict categorization.
|