singularity-forge/docs/dev/ADR-001-branchless-worktree-architecture.md
2026-05-06 00:38:36 +02:00

317 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-001: Branchless Worktree Architecture
> Historical note: this ADR predates the current DB-backed planning-state direction.
> Where it says markdown is truth and DB is cache, that statement is superseded by
> `docs/adr/0000-purpose-to-software-compiler.md` and
> `docs/adr/0001-promote-only-sf-state.md`: structured `.sf` state is authoritative
> at runtime, and markdown is a projection when structured state exists.
**Status:** Accepted — partial drift
**Date:** 2026-03-15
**Revised:** 2026-05-02 — partial drift documented; code migration incomplete
**Deciders:** Lex Christopherson
**Advisors:** Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)
## Context
SF uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a **worktree per milestone** with **slice branches inside each worktree**. Each slice (`S01`, `S02`, ...) gets its own branch (`sf/M001/S01`) within the worktree, which merges back to the milestone branch (`milestone/M001`) via `--no-ff` when the slice completes. The milestone branch squash-merges to `main` when the milestone completes.
This architecture replaced a previous "branch-per-slice" model that had severe `.sf/` merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.
### Problems
**1. Planning artifact invisibility (loop detection failures)**
When `research-slice` or `plan-slice` dispatches, the agent writes artifacts (e.g., `S02-RESEARCH.md`) on a slice branch. After the agent completes, `handleAgentEnd` switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. `verifyExpectedArtifact()` checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.
Documented in the auto-stop architecture doc as "The Branch-Switching Problem."
**2. `.sf/` state clobbering across branches**
`.sf/` is gitignored (line 52 of `.gitignore`: `.sf/`). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in `.sf/milestones/` but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single `.sf/` directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. SF reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.
The codebase has a contradictory workaround: `smartStage()` (git-service.ts:304-352) force-adds `SF_DURABLE_PATHS` (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the `.gitignore`. This means `.sf/milestones/` IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.
**3. Merge/conflict code complexity**
The current slice branch model requires:
- `mergeSliceToMilestone()` — 98 lines, `--no-ff` merge with `withMergeHeal` wrapper
- `mergeSliceToMain()` — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
- `git-self-heal.ts` — 198 lines, 3 recovery functions for merge failures
- `fix-merge` dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
- `smartStage()` — 49 lines of runtime exclusion during staging
- Conflict categorization — 80 lines classifying `.sf/` vs runtime vs code conflicts
Total: **~582 lines** of merge/branch/conflict code across 3 files, plus the `fix-merge` prompt template and dispatch logic. This code exists solely because of slice branches.
**4. Dual isolation modes**
Branch-mode (`git-service.ts:mergeSliceToMain`) and worktree-mode (`auto-worktree.ts:mergeSliceToMilestone`) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.
**5. Bug history**
- v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
- v2.13.1: Windows hotfix for multi-line commit messages in `mergeSliceToMilestone`
- 15+ separate bug fixes for `.sf/` merge conflicts in the pre-M003 era
- Persistent user complaints about loop detection failures and state corruption
## Decision
**Eliminate slice branches entirely.** All work within a milestone worktree commits sequentially on a single branch (`milestone/<MID>`). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.
Track `.sf/` planning artifacts in git. Gitignore only runtime/ephemeral state.
### The Architecture
```
main ──────────────────────────────────────────── main
│ ↑
└─ worktree (milestone/M001) │
│ │
commit: feat(M001): context + roadmap │
commit: feat(M001/S01): research │
commit: feat(M001/S01): plan │
commit: feat(M001/S01/T01): impl │
commit: feat(M001/S01/T02): impl │
commit: feat(M001/S01): summary + UAT │
commit: feat(M001/S02): research │
commit: ... │
commit: feat(M001): milestone complete │
│ │
└──────────── squash merge ──────────────────┘
```
### Git Primitives Used
| Primitive | Purpose |
|-----------|---------|
| **Worktrees** | One per active milestone. Filesystem isolation. |
| **Commits** | Granular sequential history of every action. |
| **Squash merge** | Clean single commit on `main` per milestone. |
| **Branches** | Only `main` and `milestone/<MID>`. Nothing else. |
### Git Primitives NOT Used
| Primitive | Why Not |
|-----------|---------|
| Slice branches | Slices are sequential. Branches add complexity with no rollback benefit. |
| `--no-ff` merges | No branches to merge within a worktree. |
| Branch switching | Never happens. All work on one branch. |
| Conflict resolution | No merges within a worktree means no conflicts within a worktree. |
### `.sf/` Tracking Model
**Tracked in git (travels with the branch):**
```
.sf/milestones/ — roadmaps, plans, summaries, research, contexts, task plans/summaries
.sf/PROJECT.md — project overview
.sf/DECISIONS.md — architectural decision register
.sf/REQUIREMENTS.md — requirements register
.sf/QUEUE.md — work queue
```
**Gitignored (ephemeral, runtime, infrastructure):**
```
.sf/runtime/ — dispatch records, timeout tracking
.sf/activity/ — JSONL session dumps
.sf/worktrees/ — git worktree working directories
.sf/auto.lock — crash detection sentinel
.sf/metrics.json — token/cost accumulator
.sf/completed-units.json — dispatch idempotency tracker
.sf/STATE.md — derived state cache (rebuilt by deriveState())
.sf/sf.db — SQLite cache (rebuilt from tracked markdown by importers)
.sf/DISCUSSION-MANIFEST.json — discussion phase tracking
.sf/milestones/**/*-CONTINUE.md — interrupted-work markers
.sf/milestones/**/continue.md — legacy continue markers
```
### `.gitignore` Update
Replace the current blanket `.sf/` ignore with explicit runtime-only ignores:
```gitignore
# ── SF: Runtime / Ephemeral ─────────────────────────────────
.sf/auto.lock
.sf/completed-units.json
.sf/STATE.md
.sf/metrics.json
.sf/sf.db
.sf/activity/
.sf/runtime/
.sf/worktrees/
.sf/DISCUSSION-MANIFEST.json
.sf/milestones/**/*-CONTINUE.md
.sf/milestones/**/continue.md
```
Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in `.gitignore` and are tracked normally.
## Consequences
### Code Deletion
| File | Lines Deleted | What's Removed |
|------|--------------|----------------|
| `auto-worktree.ts` | ~246 | `mergeSliceToMilestone()`, `shouldUseWorktreeIsolation()`, `getMergeToMainMode()`, slice merge guards |
| `git-service.ts` | ~250 | `mergeSliceToMain()`, conflict resolution, runtime stripping post-merge, `ensureSliceBranch()`, `switchToMain()` |
| `git-self-heal.ts` | ~86 | `abortAndReset()`, `withMergeHeal()` (merge-specific recovery) **(still present — see Drift section)** |
| `auto.ts` | ~150 | Merge dispatch guards, `fix-merge` dispatch path, branch-mode routing **(partially still present — see Drift section)** |
| `worktree.ts` | ~40 | `getSliceBranchName()`, `ensureSliceBranch()`, `mergeSliceToMain()` delegates **(getSliceBranchName still present — see Drift section)** |
| **Test files** | ~11 files | `auto-worktree-merge.test.ts`, `auto-worktree-milestone-merge.test.ts`, merge-related test cases |
| **Total** | **~770+ lines** | |
*Verified 2026-05-02: mergeSliceToMilestone and the original mergeSliceToMain (git-service.ts) are deleted. Several other items listed above are still present — see Drift section below. Current authoritative milestone merge path: `src/resources/extensions/sf/auto-worktree.ts:1616` (`mergeMilestoneToMain`). A separate newer function `mergeSliceToMain` at `src/resources/extensions/sf/slice-cadence.ts:92` was added post-ADR for the slice-cadence collapse feature (#4765) and is unrelated to the deleted branch-era function.*
### What `mergeMilestoneToMain()` Becomes
The function simplifies dramatically:
1. Auto-commit any dirty state in worktree
2. `chdir` back to main repo root
3. `git checkout main`
4. `git merge --squash milestone/<MID>`
5. `git commit` with milestone summary
6. Remove worktree + delete branch
No conflict categorization. No runtime file stripping. No `.sf/` special handling. Planning artifacts merge cleanly because they're in `.sf/milestones/M001/` which doesn't exist on `main` until this merge.
### What `smartStage()` Becomes
The force-add of `SF_DURABLE_PATHS` is no longer needed — planning artifacts are not gitignored, so `git add -A` picks them up naturally. The function reduces to:
1. `git add -A`
2. `git reset HEAD -- <runtime paths>` (unstage runtime files)
The `_runtimeFilesCleanedUp` one-time migration logic can also be removed.
*Verified 2026-05-02: `smartStage()` at `src/resources/extensions/sf/git-service.ts:576` is still present in its original form, including `_runtimeFilesCleanedUp` migration logic and `SF_MILESTONE_LOCK` parallel-scope logic. The simplification described here has not been performed — see Drift section.*
### What Happens to `handleAgentEnd()`
After any unit completes:
1. Invalidate caches
2. `autoCommitCurrentBranch()` — commits on the one and only branch
3. `verifyExpectedArtifact()` — file is always on the current branch (no branch switching)
4. Persist completion key
The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.
### What Happens to `fix-merge`
The `fix-merge` dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.
*Verified 2026-05-02: The `fix-merge` prompt template has been deleted (no `fix-merge.md` exists in `src/resources/extensions/sf/prompts/`). However, `MergeConflictError` (re-exported from `git-service.ts:201`) is still present along with a JSDoc comment at `git-service.ts:199` that still mentions dispatching a "fix-merge session". No active dispatch-loop code was found — this appears to be residual documentation in the class definition, not active dispatch logic. Treated as "still present (partial)" in the Drift section.*
### Backwards Compatibility
The `shouldUseWorktreeIsolation()` three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The `git.isolation: "branch"` preference is deprecated.
Projects with existing `sf/M001/S01` slice branches can still be read by state derivation, but new work never creates slice branches.
### Risks
**1. Parallel milestone code conflicts at squash-merge time**
If two milestones modify the same source file, the second squash-merge to `main` will conflict. Mitigation: `git fetch origin main && git rebase main` before squash-merge. This is standard practice and rare in single-user workflows.
**2. Loss of per-slice git history after squash**
Squash merge collapses all commits into one on `main`. Mitigations:
- Commit messages tag slices (`feat(M001/S01/T01):`) — filterable with `git log --grep`
- The milestone branch can be preserved (not deleted) if history is needed
- Alternative: `merge --no-ff` instead of `--squash` to keep history on `main`
**3. SQLite DB desync after `git reset`**
If tracked markdown rolls back via `git reset --hard`, the gitignored `sf.db` doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.
**4. Disk space with multiple worktrees**
Each worktree duplicates the working directory (including `node_modules`). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.
## Alternatives Considered
### A. Keep slice branches, fix visibility with immediate mini-merges
After `research-slice` or `plan-slice`, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.
**Rejected:** Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.
### B. Keep `.sf/` gitignored, bootstrap from git history for manual worktrees
When SF detects an empty `.sf/` in a worktree, reconstruct state from the branch's git history using `git show <commit>:.sf/...`.
**Rejected:** Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.
### C. Branch-scoped `.sf/` directories (`.sf/branches/<branch-name>/milestones/...`)
Each branch writes to a namespaced subdirectory within `.sf/`.
**Rejected:** Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (`git checkout` doesn't rename directories).
## Validation
This architecture was stress-tested by three independent models:
**Gemini 2.5 Pro** identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).
**GPT-5.4 (Codex)** read the full codebase and confirmed the model is sound. Identified that `smartStage()` already force-adds durable paths (validating the tracked-artifact approach) and that `resolveMainWorktreeRoot` in PR #487 is architecturally wrong (adopted — PR to be closed).
**Codebase analysis** confirmed `.sf/milestones/` is already partially tracked on `main` despite the `.gitignore`, that `SF_DURABLE_PATHS` exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.
### Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"
Codex read the full codebase and raised 4 concerns. Each is addressed:
**Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."**
Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, `deriveState()` reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.
**Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."**
Rebuttal: Valid edge case. If `/sf queue` edits `DECISIONS.md` on `main` while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.
**Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."**
Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is `main` diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against `main`. Pre-flight rebase before squash-merge catches this more directly.
**Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."**
Response: Accepted in spirit. Commits with conventional tags (`feat(M001/S01):`, `feat(M001/S01/T01):`) serve as the slice boundary primitive. `git log --grep="M001/S01"` isolates a slice's history. `git revert` targets specific commits. Git tags (`sf/M001/S01-complete`) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.
## Action Items
1. Close PR #487 (`resolveMainWorktreeRoot`) — contradicts this architecture
2. Implement as a SF milestone with phases:
- Update `.gitignore` and force-add existing planning artifacts
- Remove slice branch creation/switching/merging code
- Simplify `mergeMilestoneToMain()` and `smartStage()`
- Remove `fix-merge` dispatch unit
- Remove branch-mode isolation (`git.isolation: "branch"`)
- Update/delete 11 test files
- Update README suggested gitignore
- Migration path for existing projects with slice branches
## Drift From Original Decision
*Audited 2026-05-02. Items the ADR claims were deleted that are still present in the codebase:*
| Item | Status | File:Line | Cleanup Pending |
|------|--------|-----------|-----------------|
| `git-self-heal.ts` (whole file) | **Still present** | `src/resources/extensions/sf/git-self-heal.ts:1142` | File is 142 lines; exports `abortAndReset()` and `formatGitError()`. The ADR claimed ~86 lines deleted. Delete entire file and migrate any callers of `abortAndReset()` to in-place reset logic. |
| `smartStage()` | **Still present** | `src/resources/extensions/sf/git-service.ts:576` | Still has `_runtimeFilesCleanedUp` migration logic, `SF_MILESTONE_LOCK` parallel-scope exclusions, and 49+ lines of runtime exclusion. Simplify as described in Consequences section. |
| `shouldUseWorktreeIsolation()` | **Still present** | `src/resources/extensions/sf/auto.ts:357` | ADR requires single-mode worktree-always behavior; this function still exists and defaults to `false` (worktree off unless explicit opt-in). Branch-mode fallback persists. Remove after deprecating `git.isolation: "branch"`. |
| `getSliceBranchName()` | **Still present** | `src/resources/extensions/sf/worktree.ts:261` | Still used by `workspace-index.ts:156` to record historical branch names. Evaluate whether this is still needed or can be removed. |
| `MergeConflictError` + fix-merge JSDoc | **Partially present** | `src/resources/extensions/sf/git-service.ts:199225` | `MergeConflictError` class and a JSDoc comment referencing "dispatch a fix-merge session" remain. The `fix-merge.md` prompt template is deleted; no active dispatch loop found. Remove the JSDoc reference and evaluate if `MergeConflictError` is still needed (it is — used by `slice-cadence.ts`). |
### Items Confirmed Deleted
- `mergeSliceToMilestone()` — not found anywhere in the codebase.
- Original `mergeSliceToMain()` (branch-era, from `git-service.ts`) — deleted. A *new* `mergeSliceToMain()` exists at `src/resources/extensions/sf/slice-cadence.ts:92` but was added post-ADR for the slice-cadence collapse feature (#4765) and is architecturally consistent with the branchless model.
- `fix-merge.md` prompt template — deleted (no file at `src/resources/extensions/sf/prompts/fix-merge.md`).
- Conflict categorization (~80 lines) — not found.
- `withMergeHeal()` — not found.
- `ensureSliceBranch()` / `switchToMain()` / `getMergeToMainMode()` — not found.
### Current Authoritative Merge Path
The current milestone→main merge implementation is `mergeMilestoneToMain()` at `src/resources/extensions/sf/auto-worktree.ts:1616`. It performs squash-merge after auto-committing dirty worktree state, reconciling the worktree DB, and running a pre-flight rebase. It does not use slice branches, `withMergeHeal`, or conflict categorization.