From 296054b1d4be3db668902246d0ea901d33d68d9c Mon Sep 17 00:00:00 2001 From: Mikael Hugo Date: Mon, 11 May 2026 19:46:06 +0200 Subject: [PATCH] TODO: drop snapshot blob from md-tracking; use git for diff source Per follow-up: SF generates many of these .md files itself (.sf/wiki/*, .sf/milestones/**/*.md, docs/plans/**), so storing gzipped snapshots in the DB would duplicate disk + git for no benefit. Simpler design: store only the sha + meta in sf.db; compute diffs on demand against `git show HEAD:`. Naturally handles both "working-tree edit not yet committed" and "another agent committed while SF wasn't running". Co-Authored-By: Claude Opus 4.7 (1M context) --- TODO.md | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/TODO.md b/TODO.md index 8703650c4..68699fd60 100644 --- a/TODO.md +++ b/TODO.md @@ -118,31 +118,45 @@ Explicit out of scope: is expected, no signal in tracking. - `node_modules`, `dist`, vendored copies — irrelevant. -Storage in `sf.db`: +Storage in `sf.db` — **shas only, no content snapshots**. SF generates +many of these files itself; caching their contents in the DB would +duplicate disk + git for no benefit: ```sql CREATE TABLE tracked_md_files ( relpath TEXT PRIMARY KEY, -- repo-relative path - sha256 TEXT NOT NULL, + sha256 TEXT NOT NULL, -- hash of last-seen content size_bytes INTEGER NOT NULL, last_seen_at TEXT NOT NULL, - snapshot BLOB, -- gzipped content, optional category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan' ); ``` +For diff source, use **git** (these are all tracked files; if they're +not, the agent should add them or skip tracking that path): + +``` +git show HEAD: ← what was committed + ← what's on disk now +diff the two ← what changed since the last commit +``` + +This naturally handles "the operator edited but hasn't committed yet" +(diff shows the working-tree change) and "another agent committed and +SF wasn't running" (diff shows the new commit). + On session start + each autonomous-cycle entry, walk the configured glob set, hash each file, diff against `tracked_md_files.sha256`. For each changed file: -1. Compute diff against `snapshot` (or `git show HEAD:` if the - file is tracked). -2. Surface to operator: "**N** files changed since last seen — review - or accept?" with per-file inline diffs. -3. On accept → update sha + snapshot. On reject → restore from - snapshot. -4. New files (sha not in DB) → import + classify by glob category. -5. Deleted files → archive (don't purge until operator confirms). +1. Surface to operator: "**N** files changed since SF last saw — review + or accept?" with per-file diff (computed from git, not from a DB + blob). +2. On accept → update sha + last_seen_at. No content stored. +3. New files (sha not in DB) → classify by glob category, store sha, + continue. +4. Deleted files → archive the DB row (mark inactive); don't purge + until operator confirms. Useful for: - hand-edits / cross-agent edits / git pulls (the original