TODO: simplify md-tracking — drop snapshot blob, accept mid-edit corner
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Final settled design: sha + git ref only, no DB content snapshots at all. The mid-edit case (file observed dirty) loses the ability to reconstruct the intermediate working-tree state, but the change- detection signal is preserved and the operator can commit first if intermediate fidelity matters. Trades a corner-case fidelity loss for a much simpler schema and no DB-vs-disk content duplication. Git remains the only version store; the DB row is a pure "where I left off" pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
76923afb91
commit
eacbbaac82
6 changed files with 23 additions and 29 deletions
52
TODO.md
52
TODO.md
|
|
@ -118,45 +118,39 @@ Explicit out of scope:
|
|||
is expected, no signal in tracking.
|
||||
- `node_modules`, `dist`, vendored copies — irrelevant.
|
||||
|
||||
Storage in `sf.db` — sha + git ref, with **snapshot only as a fallback
|
||||
for uncommitted observations**. SF generates many of these files
|
||||
itself; storing every version in the DB would duplicate disk + git
|
||||
for no benefit. But we still need a reference point to compute diffs
|
||||
against — that's the versioning question.
|
||||
Storage in `sf.db` — sha + git ref, no content snapshots. Git is the
|
||||
version store; the DB is just a pointer:
|
||||
|
||||
```sql
|
||||
CREATE TABLE tracked_md_files (
|
||||
relpath TEXT PRIMARY KEY, -- repo-relative path
|
||||
sha256 TEXT NOT NULL, -- hash of last-seen content
|
||||
size_bytes INTEGER NOT NULL,
|
||||
last_seen_at TEXT NOT NULL,
|
||||
last_seen_commit TEXT, -- git SHA1 of HEAD when we saw it
|
||||
uncommitted_snapshot BLOB, -- gzipped, ONLY if observed in working tree
|
||||
category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan'
|
||||
relpath TEXT PRIMARY KEY, -- repo-relative path
|
||||
sha256 TEXT NOT NULL, -- hash of last-seen content
|
||||
size_bytes INTEGER NOT NULL,
|
||||
last_seen_at TEXT NOT NULL,
|
||||
last_seen_commit TEXT, -- git SHA1 of HEAD when observed
|
||||
category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan'
|
||||
);
|
||||
```
|
||||
|
||||
Versioning + diff source decision tree per file:
|
||||
Diff source priority:
|
||||
|
||||
1. **Observed at commit X (file was clean at the time)** →
|
||||
store `last_seen_commit = X`, `uncommitted_snapshot = NULL`. Diff
|
||||
later = `git show X:<path>` vs current. Cheap, no DB blob.
|
||||
1. **Tracked + committed at observation** (the common case):
|
||||
`git diff <last_seen_commit> -- <path>` shows everything since.
|
||||
Cheap, no blob, perfect history via `git log <path>` if needed.
|
||||
|
||||
2. **Observed with uncommitted changes (working-tree state at time of
|
||||
observation)** → store `uncommitted_snapshot = gzip(content)`,
|
||||
`last_seen_commit = HEAD-at-the-time-anyway`. Diff later = unpack
|
||||
the snapshot vs current. Necessary because there is no git ref
|
||||
that ever held that exact content.
|
||||
2. **Tracked + uncommitted at observation** (mid-edit corner): no git
|
||||
ref points at that exact content. Diff shows "changed since
|
||||
`<last_seen_commit>`" but the prior intermediate working-tree state
|
||||
isn't reconstructable. Acceptable trade-off — the main signal is
|
||||
"changed", and the operator can commit before letting SF observe
|
||||
if intermediate fidelity matters.
|
||||
|
||||
3. **File untracked or in .gitignore** (transient SF state, generated
|
||||
artifacts) → either skip tracking entirely (preferred), or treat
|
||||
it like case 2 (always store snapshot). Don't pretend a git ref
|
||||
exists when it doesn't.
|
||||
3. **Untracked / gitignored**: not tracked in this table. SF-generated
|
||||
transient files don't belong in version control or in this audit.
|
||||
|
||||
In practice most md SF deals with is case 1 — committed at
|
||||
observation time — so the snapshot blob stays NULL for most rows. The
|
||||
DB stays small; the working-tree-edit corner case still has a clean
|
||||
diff.
|
||||
History per file = `git log <relpath>` (already there, free). SF's DB
|
||||
just records "where I left off." No `md_observation_log` history
|
||||
table unless someone has a concrete need for an SF-side timeline.
|
||||
|
||||
On session start + each autonomous-cycle entry, walk the configured
|
||||
glob set, hash each file, diff against `tracked_md_files.sha256`.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue