From eacbbaac825d821124eb90c147b4f1067bc606a7 Mon Sep 17 00:00:00 2001 From: Mikael Hugo Date: Mon, 11 May 2026 19:49:25 +0200 Subject: [PATCH] =?UTF-8?q?TODO:=20simplify=20md-tracking=20=E2=80=94=20dr?= =?UTF-8?q?op=20snapshot=20blob,=20accept=20mid-edit=20corner?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final settled design: sha + git ref only, no DB content snapshots at all. The mid-edit case (file observed dirty) loses the ability to reconstruct the intermediate working-tree state, but the change- detection signal is preserved and the operator can commit first if intermediate fidelity matters. Trades a corner-case fidelity loss for a much simpler schema and no DB-vs-disk content duplication. Git remains the only version store; the DB row is a pure "where I left off" pointer. Co-Authored-By: Claude Opus 4.7 (1M context) --- .sf/wiki/{architecture.md => ARCHITECTURE.md} | 0 .sf/wiki/{glossary.md => GLOSSARY.md} | 0 .sf/wiki/{index.md => INDEX.md} | 0 .sf/wiki/{subsystems.md => SUBSYSTEMS.md} | 0 .sf/wiki/{workflows.md => WORKFLOWS.md} | 0 TODO.md | 52 ++++++++----------- 6 files changed, 23 insertions(+), 29 deletions(-) rename .sf/wiki/{architecture.md => ARCHITECTURE.md} (100%) rename .sf/wiki/{glossary.md => GLOSSARY.md} (100%) rename .sf/wiki/{index.md => INDEX.md} (100%) rename .sf/wiki/{subsystems.md => SUBSYSTEMS.md} (100%) rename .sf/wiki/{workflows.md => WORKFLOWS.md} (100%) diff --git a/.sf/wiki/architecture.md b/.sf/wiki/ARCHITECTURE.md similarity index 100% rename from .sf/wiki/architecture.md rename to .sf/wiki/ARCHITECTURE.md diff --git a/.sf/wiki/glossary.md b/.sf/wiki/GLOSSARY.md similarity index 100% rename from .sf/wiki/glossary.md rename to .sf/wiki/GLOSSARY.md diff --git a/.sf/wiki/index.md b/.sf/wiki/INDEX.md similarity index 100% rename from .sf/wiki/index.md rename to .sf/wiki/INDEX.md diff --git a/.sf/wiki/subsystems.md b/.sf/wiki/SUBSYSTEMS.md similarity index 100% rename from .sf/wiki/subsystems.md rename to .sf/wiki/SUBSYSTEMS.md diff --git a/.sf/wiki/workflows.md b/.sf/wiki/WORKFLOWS.md similarity index 100% rename from .sf/wiki/workflows.md rename to .sf/wiki/WORKFLOWS.md diff --git a/TODO.md b/TODO.md index ba24fa82b..e26473c24 100644 --- a/TODO.md +++ b/TODO.md @@ -118,45 +118,39 @@ Explicit out of scope: is expected, no signal in tracking. - `node_modules`, `dist`, vendored copies — irrelevant. -Storage in `sf.db` — sha + git ref, with **snapshot only as a fallback -for uncommitted observations**. SF generates many of these files -itself; storing every version in the DB would duplicate disk + git -for no benefit. But we still need a reference point to compute diffs -against — that's the versioning question. +Storage in `sf.db` — sha + git ref, no content snapshots. Git is the +version store; the DB is just a pointer: ```sql CREATE TABLE tracked_md_files ( - relpath TEXT PRIMARY KEY, -- repo-relative path - sha256 TEXT NOT NULL, -- hash of last-seen content - size_bytes INTEGER NOT NULL, - last_seen_at TEXT NOT NULL, - last_seen_commit TEXT, -- git SHA1 of HEAD when we saw it - uncommitted_snapshot BLOB, -- gzipped, ONLY if observed in working tree - category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan' + relpath TEXT PRIMARY KEY, -- repo-relative path + sha256 TEXT NOT NULL, -- hash of last-seen content + size_bytes INTEGER NOT NULL, + last_seen_at TEXT NOT NULL, + last_seen_commit TEXT, -- git SHA1 of HEAD when observed + category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan' ); ``` -Versioning + diff source decision tree per file: +Diff source priority: -1. **Observed at commit X (file was clean at the time)** → - store `last_seen_commit = X`, `uncommitted_snapshot = NULL`. Diff - later = `git show X:` vs current. Cheap, no DB blob. +1. **Tracked + committed at observation** (the common case): + `git diff -- ` shows everything since. + Cheap, no blob, perfect history via `git log ` if needed. -2. **Observed with uncommitted changes (working-tree state at time of - observation)** → store `uncommitted_snapshot = gzip(content)`, - `last_seen_commit = HEAD-at-the-time-anyway`. Diff later = unpack - the snapshot vs current. Necessary because there is no git ref - that ever held that exact content. +2. **Tracked + uncommitted at observation** (mid-edit corner): no git + ref points at that exact content. Diff shows "changed since + ``" but the prior intermediate working-tree state + isn't reconstructable. Acceptable trade-off — the main signal is + "changed", and the operator can commit before letting SF observe + if intermediate fidelity matters. -3. **File untracked or in .gitignore** (transient SF state, generated - artifacts) → either skip tracking entirely (preferred), or treat - it like case 2 (always store snapshot). Don't pretend a git ref - exists when it doesn't. +3. **Untracked / gitignored**: not tracked in this table. SF-generated + transient files don't belong in version control or in this audit. -In practice most md SF deals with is case 1 — committed at -observation time — so the snapshot blob stays NULL for most rows. The -DB stays small; the working-tree-edit corner case still has a clean -diff. +History per file = `git log ` (already there, free). SF's DB +just records "where I left off." No `md_observation_log` history +table unless someone has a concrete need for an SF-side timeline. On session start + each autonomous-cycle entry, walk the configured glob set, hash each file, diff against `tracked_md_files.sha256`.