feat(sf): sf-audit-traces workflow for slow self-improvement loop

A standalone agent prompt that reads SF's observability sources (self-feedback / journal / activity / judgments / forensics) and files AT MOST 3 recurring-pattern findings via sf_self_report so they enter the existing triage flow. PDD spec: Purpose: continuous self-improvement loop. SF already has the data sources (self-feedback.jsonl, journal/, activity/, judgments/) and the consumer pattern (triage-self-feedback → requirement-promoter). What was missing: a standalone prompt that pulls those sources together for a scheduled run. Consumer: agents invoked via '/schedule every morning sf-audit-traces' (cloud) or '/sf workflow run sf-audit-traces' (manual). Contract: 1. Snapshot the trace volumes (file counts + line counts) into evidence so reports are concrete, not prose. 2. Bar = 3+ occurrences. Single events go to operator eyeballs, not permanent self-feedback entries. 3. Hard cap of 3 entries per run. The whole point is slow iteration — the triage queue is human-paced, not a firehose. 4. NEVER auto-apply. Even if the fix looks one-line obvious, file and stop. The triage flow decides what becomes work. 5. Zero findings is a successful run when the system is healthy. Failure boundary: missing source files → skip silently. Read errors → handle gracefully. Never block on absence. Evidence (verified during scan before writing): - 181 self-feedback entries (55 open, 126 resolved) - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure (2), context-injection-gap (2), orphan-prompt (2) - Journal: 6-233 events per active day - Activity logs: per-unit JSONL transcripts present - All sources accessible via plain file reads — no special tools. Non-goals: - ML training on traces - Cross-project trace aggregation - Auto-applying fixes (triage flow already does that) - Fast iteration (deliberately slow — 3/run cap means at most 21 new triage items per week even with daily runs) Invariants: - Safety: agent never edits code/prompts/templates/docs. - Liveness: zero findings is a valid output. The agent doesn't fabricate patterns to justify a run. Discovery verified: 28 total workflow templates after this commit (was 27); plugins.get('sf-audit-traces') returns the plugin from the bundled source. Pairs with: triage-self-feedback (reads what this files), requirement-promoter (auto-promotes recurring kinds to requirements), self-feedback-drain (session-start drain into repair turns). The audit is the IN end of that pipeline; the rest of SF was already the OUT end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:15:13 +02:00 · 2026-05-02 21:15:13 +02:00 · 82633b6f5e
commit 82633b6f5e
parent e381e3c8ad
2 changed files with 115 additions and 0 deletions
--- a/src/resources/extensions/sf/workflow-templates/registry.json
+++ b/src/resources/extensions/sf/workflow-templates/registry.json
@ -373,6 +373,16 @@
 			"artifact_dir": null,
 			"estimated_complexity": "medium",
 			"requires_project": false
+		},
+		"sf-audit-traces": {
+			"name": "SF Audit Traces",
+			"description": "Read SF's observability sources (self-feedback, journal, activity, judgments) and file at most 3 recurring-pattern findings via sf_self_report — designed for /schedule daily cadence",
+			"file": "sf-audit-traces.md",
+			"phases": ["audit"],
+			"triggers": ["sf audit traces", "audit sf", "self improve", "scan sf logs", "improve sf"],
+			"artifact_dir": null,
+			"estimated_complexity": "low",
+			"requires_project": true
 		}
 	}
 }
--- a/src/resources/extensions/sf/workflow-templates/sf-audit-traces.md
+++ b/src/resources/extensions/sf/workflow-templates/sf-audit-traces.md
@ -0,0 +1,105 @@
+# SF Audit Traces
+
+<template_meta>
+name: sf-audit-traces
+version: 1
+mode: oneshot
+requires_project: true
+artifact_dir: null
+</template_meta>
+
+<purpose>
+Read SF's own observability sources, identify ONE non-obvious recurring pattern,
+and file it as self-feedback so the existing triage flow can promote it.
+
+Iterate slowly: at most three entries per run. The point isn't to flood the
+queue — the point is to catch what no single session noticed.
+</purpose>
+
+<inputs>
+- `.sf/SELF-FEEDBACK.md` — markdown view of filed anomalies
+- `.sf/self-feedback.jsonl` — durable source of truth
+- `.sf/journal/YYYY-MM-DD.jsonl` — per-day dispatch + iteration events
+- `.sf/activity/{seq}-{type}-{id}.jsonl` — per-unit transcript
+- `.sf/judgments/*.jsonl` — recorded agent decisions (when present)
+- `.sf/forensics/*.json` — saved post-mortems (when present)
+</inputs>
+
+<process>
+
+## 1. Snapshot
+
+Run these to anchor the scan in real numbers — file paths and counts go
+into the eventual self-feedback evidence:
+
+```bash
+wc -l .sf/self-feedback.jsonl 2>/dev/null
+ls .sf/journal/ 2>/dev/null | tail -7
+ls .sf/activity/ 2>/dev/null | wc -l
+```
+
+Read the latest 7 days of journal files plus the last 30 activity files. If
+a source is missing, skip silently — never block on absence.
+
+## 2. Look for recurring patterns
+
+The bar is **3+ occurrences** across the data, not single events. Examples:
+
+- The same `kind` filed by self-feedback 3+ times in a week
+- The same dispatch rule firing then immediately being un-applied (paired
+  events in the journal)
+- The same tool error repeating across activity logs
+- The same run-away-guard pause across multiple units
+- The same auto-resolved entry kind triaged as `wontfix` repeatedly (signal
+  the detector is too noisy)
+- A judgment that proves wrong over multiple subsequent units
+
+Single events go to the operator's eyeballs, not to a permanent self-feedback
+entry. Patterns earn one.
+
+## 3. File at most three findings
+
+For each pattern:
+
+- One call to `sf_self_report` with `kind` (slug, hyphenated), `severity`
+  (`low`/`medium`/`high`/`critical` — almost always `medium`), `summary`
+  (one sentence naming the pattern), `evidence` (concrete file paths +
+  line numbers + counts), `suggestedFix` (one or two specific edits — not
+  prose).
+- Set `source: "agent"` so triage knows where it came from.
+- Cite at least three observed instances in `evidence` so the triage agent
+  can verify without re-reading every log.
+
+If you find nothing pattern-worthy, file zero. That is a successful run —
+silence is the correct output when the system is healthy.
+
+## 4. NEVER auto-apply
+
+Do not edit code, prompts, templates, or docs. The triage flow
+(`triage-self-feedback`, `requirement-promoter`) decides what becomes work.
+Your job ends at filing. Even if the fix looks one-line obvious — file it
+and stop.
+
+## 5. NEVER ship a flood
+
+Three is a hard cap. If you find a fourth, hold it for the next run. The
+slow-pace constraint is deliberate — the triage flow is a human-paced
+queue, not a firehose intake.
+
+</process>
+
+<output>
+A short report:
+
+- snapshot numbers (entries scanned, days covered)
+- patterns considered + which ones met the 3-occurrence bar
+- entry IDs filed (with `sf_self_report`'s returned id), or "none filed"
+  when the system is healthy
+- one sentence on what trend you'd watch next run
+</output>
+
+<scheduling_hint>
+This template pairs well with `/schedule every morning sf-audit-traces`.
+Daily cadence + the 3-entry cap means the triage queue grows by at most 21
+entries per week, which a human triage pass can clear in one sitting.
+</scheduling_hint>