From faecdc828cb8534035930fe94d0169e7abe7a62e Mon Sep 17 00:00:00 2001 From: Mikael Hugo Date: Mon, 11 May 2026 19:45:39 +0200 Subject: [PATCH] TODO: generalise sha-tracking from milestones to all source-of-truth .md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per follow-up: not just .sf/milestones/**/*.md but the broader set of markdown files that SF (or humans) treat as authoritative — AGENTS.md, .github/copilot-instructions.md, .sf/wiki/**, docs/adr/**, docs/plans/**, and root-level meta files. Explicit out-of-scope list: TODO.md (reset every cycle by triage), CHANGELOG.md / BUILD_PLAN.md (append-only by design), vendored or generated content. Tracking those would just be noise. Spec includes a tracked_md_files schema, the walk/diff/surface flow, and an honest accounting of storage cost (~40 bytes per file + optional gzipped snapshot). Co-Authored-By: Claude Opus 4.7 (1M context) --- TODO.md | 90 +++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 62 insertions(+), 28 deletions(-) diff --git a/TODO.md b/TODO.md index 908b59e23..8703650c4 100644 --- a/TODO.md +++ b/TODO.md @@ -89,40 +89,74 @@ Wanted: extend the triage JSON schema so each implementation task is and update `appendBacklogItems` + a future milestone-escalator to read the structured tier rather than re-parsing markdown. -## Detect manual edits to milestone files (sha-tracked, diff on change) +## Sha-track every source-of-truth markdown file, diff on change -Milestone files (`CONTEXT.md`, `MILESTONE-SUMMARY.md`, `ROADMAP.md`, -`SUMMARY.md` + each slice's `PLAN.md` / `SUMMARY.md` + each task's -`PLAN.md` / `SUMMARY.md`) are source of truth for SF planning. Today -nothing notices if a human (or another agent) edits one out of band: -SF keeps using the in-memory or DB-cached state, drifts from disk, and -downstream tools see a different milestone than the human just wrote. +Generalised from the milestone-files case: **any markdown file that is +a source of truth for SF or for humans navigating the repo** should +be sha-tracked, and any change since SF last saw it should surface as +a diff for review (or auto-accept under a configured policy). -Wanted: on session start (and on each autonomous-cycle entry), walk -`.sf/milestones/**/*.md`, hash each file, compare to the last-known -sha in `sf.db` (new column `milestone_files.sha256`, -`milestone_files.last_seen_at`). For any file whose sha has changed: +In scope (per repo): -1. Compute the diff against the last-seen content (stored alongside - the sha as a compressed blob, or just re-fetched from git if the +- **Repo-level meta** — `AGENTS.md`, `README.md`, `STATUS.md`, + `BACKLOG.md`, `STANDALONE.md`, `MIGRATION.md`, etc. (any uppercase + root-level `.md`) +- **Pointer** — `.github/copilot-instructions.md` +- **Wiki** — `.sf/wiki/**/*.md` +- **Planning** — `.sf/milestones/**/*.md` (`CONTEXT`, `MILESTONE-SUMMARY`, + `ROADMAP`, `SUMMARY` per milestone; `PLAN` / `SUMMARY` per slice; same + per task) +- **ADRs** — `docs/adr/**/*.md` (these should rarely change, so any + edit is loud and worth surfacing) +- **Triage outputs** — `docs/plans/**/*.md` + +Explicit out of scope: + +- `TODO.md` — gets reset to empty template by `/todo triage` on every + cycle; tracking churn here is just noise. +- `CHANGELOG.md` / `BUILD_PLAN.md` — append-only by design; sha churn + is expected, no signal in tracking. +- `node_modules`, `dist`, vendored copies — irrelevant. + +Storage in `sf.db`: + +```sql +CREATE TABLE tracked_md_files ( + relpath TEXT PRIMARY KEY, -- repo-relative path + sha256 TEXT NOT NULL, + size_bytes INTEGER NOT NULL, + last_seen_at TEXT NOT NULL, + snapshot BLOB, -- gzipped content, optional + category TEXT -- 'meta'|'wiki'|'milestone'|'adr'|'plan' +); +``` + +On session start + each autonomous-cycle entry, walk the configured +glob set, hash each file, diff against `tracked_md_files.sha256`. +For each changed file: + +1. Compute diff against `snapshot` (or `git show HEAD:` if the file is tracked). -2. Surface to the operator: "Milestone M003-abc123 CONTEXT.md changed - since SF last saw it — review or accept?" with the diff inline. -3. If accepted (or in autonomous mode with a configured policy), update - the DB-cached version + sha and continue. If rejected, restore from - the last-known content. -4. New files (sha not in DB) → import as if they were a fresh - `new-milestone` scaffold and add to the index. -5. Deleted files (DB has sha but file is gone) → mark the milestone - archived and prompt the operator before purging. +2. Surface to operator: "**N** files changed since last seen — review + or accept?" with per-file inline diffs. +3. On accept → update sha + snapshot. On reject → restore from + snapshot. +4. New files (sha not in DB) → import + classify by glob category. +5. Deleted files → archive (don't purge until operator confirms). -Useful for: hand-edits, cross-agent edits (another LLM in a different -session modified the milestone), git pulls that bring in upstream -changes to milestone files, the milestone-from-triage path I sketched -earlier (so the autonomous loop notices its own scaffold). +Useful for: +- hand-edits / cross-agent edits / git pulls (the original + milestone-files motivation) +- catching when an AGENTS.md drifted because someone edited it during + a code review and nobody told SF +- ADR drift detection — ADRs should almost never change; if one does, + surface it loudly +- treating `.sf/wiki/*` as living docs that need review when they + drift from what `sf` has internalised -Storage cost: ~20 bytes per file (sha + last_seen_at) plus optional -compressed snapshot. Negligible vs. the rest of `sf.db`. +Storage cost: ~40 bytes per file (sha + meta) + optional gzipped +snapshot (typically 30-70 % of original size). Negligible vs. the +rest of `sf.db`. ## Phases-helpers extension-load error on every SF run