From 82633b6f5e5dc4a73c8ee113bad4be09c83a9a5b Mon Sep 17 00:00:00 2001
From: Mikael Hugo <mikkihugo@users.noreply.github.com>
Date: Sat, 2 May 2026 21:15:13 +0200
Subject: [PATCH] feat(sf): sf-audit-traces workflow for slow self-improvement
 loop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A standalone agent prompt that reads SF's observability sources
(self-feedback / journal / activity / judgments / forensics) and
files AT MOST 3 recurring-pattern findings via sf_self_report so
they enter the existing triage flow.

PDD spec:

Purpose: continuous self-improvement loop. SF already has the data
  sources (self-feedback.jsonl, journal/, activity/, judgments/) and
  the consumer pattern (triage-self-feedback → requirement-promoter).
  What was missing: a standalone prompt that pulls those sources
  together for a scheduled run.
Consumer: agents invoked via '/schedule every morning sf-audit-traces'
  (cloud) or '/sf workflow run sf-audit-traces' (manual).
Contract:
  1. Snapshot the trace volumes (file counts + line counts) into
     evidence so reports are concrete, not prose.
  2. Bar = 3+ occurrences. Single events go to operator eyeballs,
     not permanent self-feedback entries.
  3. Hard cap of 3 entries per run. The whole point is slow
     iteration — the triage queue is human-paced, not a firehose.
  4. NEVER auto-apply. Even if the fix looks one-line obvious, file
     and stop. The triage flow decides what becomes work.
  5. Zero findings is a successful run when the system is healthy.
Failure boundary: missing source files → skip silently. Read errors
  → handle gracefully. Never block on absence.
Evidence (verified during scan before writing):
  - 181 self-feedback entries (55 open, 126 resolved)
  - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure
    (2), context-injection-gap (2), orphan-prompt (2)
  - Journal: 6-233 events per active day
  - Activity logs: per-unit JSONL transcripts present
  - All sources accessible via plain file reads — no special tools.
Non-goals:
  - ML training on traces
  - Cross-project trace aggregation
  - Auto-applying fixes (triage flow already does that)
  - Fast iteration (deliberately slow — 3/run cap means at most 21
    new triage items per week even with daily runs)
Invariants:
  - Safety: agent never edits code/prompts/templates/docs.
  - Liveness: zero findings is a valid output. The agent doesn't
    fabricate patterns to justify a run.

Discovery verified: 28 total workflow templates after this commit
(was 27); plugins.get('sf-audit-traces') returns the plugin from
the bundled source.

Pairs with: triage-self-feedback (reads what this files),
requirement-promoter (auto-promotes recurring kinds to requirements),
self-feedback-drain (session-start drain into repair turns). The
audit is the IN end of that pipeline; the rest of SF was already
the OUT end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../sf/workflow-templates/registry.json       |  10 ++
 .../sf/workflow-templates/sf-audit-traces.md  | 105 ++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 src/resources/extensions/sf/workflow-templates/sf-audit-traces.md

diff --git a/src/resources/extensions/sf/workflow-templates/registry.json b/src/resources/extensions/sf/workflow-templates/registry.json
index 545bf60f6..eff333de4 100644
--- a/src/resources/extensions/sf/workflow-templates/registry.json
+++ b/src/resources/extensions/sf/workflow-templates/registry.json
@@ -373,6 +373,16 @@
 			"artifact_dir": null,
 			"estimated_complexity": "medium",
 			"requires_project": false
+		},
+		"sf-audit-traces": {
+			"name": "SF Audit Traces",
+			"description": "Read SF's observability sources (self-feedback, journal, activity, judgments) and file at most 3 recurring-pattern findings via sf_self_report — designed for /schedule daily cadence",
+			"file": "sf-audit-traces.md",
+			"phases": ["audit"],
+			"triggers": ["sf audit traces", "audit sf", "self improve", "scan sf logs", "improve sf"],
+			"artifact_dir": null,
+			"estimated_complexity": "low",
+			"requires_project": true
 		}
 	}
 }
diff --git a/src/resources/extensions/sf/workflow-templates/sf-audit-traces.md b/src/resources/extensions/sf/workflow-templates/sf-audit-traces.md
new file mode 100644
index 000000000..86d03ed94
--- /dev/null
+++ b/src/resources/extensions/sf/workflow-templates/sf-audit-traces.md
@@ -0,0 +1,105 @@
+# SF Audit Traces
+
+<template_meta>
+name: sf-audit-traces
+version: 1
+mode: oneshot
+requires_project: true
+artifact_dir: null
+</template_meta>
+
+<purpose>
+Read SF's own observability sources, identify ONE non-obvious recurring pattern,
+and file it as self-feedback so the existing triage flow can promote it.
+
+Iterate slowly: at most three entries per run. The point isn't to flood the
+queue — the point is to catch what no single session noticed.
+</purpose>
+
+<inputs>
+- `.sf/SELF-FEEDBACK.md` — markdown view of filed anomalies
+- `.sf/self-feedback.jsonl` — durable source of truth
+- `.sf/journal/YYYY-MM-DD.jsonl` — per-day dispatch + iteration events
+- `.sf/activity/{seq}-{type}-{id}.jsonl` — per-unit transcript
+- `.sf/judgments/*.jsonl` — recorded agent decisions (when present)
+- `.sf/forensics/*.json` — saved post-mortems (when present)
+</inputs>
+
+<process>
+
+## 1. Snapshot
+
+Run these to anchor the scan in real numbers — file paths and counts go
+into the eventual self-feedback evidence:
+
+```bash
+wc -l .sf/self-feedback.jsonl 2>/dev/null
+ls .sf/journal/ 2>/dev/null | tail -7
+ls .sf/activity/ 2>/dev/null | wc -l
+```
+
+Read the latest 7 days of journal files plus the last 30 activity files. If
+a source is missing, skip silently — never block on absence.
+
+## 2. Look for recurring patterns
+
+The bar is **3+ occurrences** across the data, not single events. Examples:
+
+- The same `kind` filed by self-feedback 3+ times in a week
+- The same dispatch rule firing then immediately being un-applied (paired
+  events in the journal)
+- The same tool error repeating across activity logs
+- The same run-away-guard pause across multiple units
+- The same auto-resolved entry kind triaged as `wontfix` repeatedly (signal
+  the detector is too noisy)
+- A judgment that proves wrong over multiple subsequent units
+
+Single events go to the operator's eyeballs, not to a permanent self-feedback
+entry. Patterns earn one.
+
+## 3. File at most three findings
+
+For each pattern:
+
+- One call to `sf_self_report` with `kind` (slug, hyphenated), `severity`
+  (`low`/`medium`/`high`/`critical` — almost always `medium`), `summary`
+  (one sentence naming the pattern), `evidence` (concrete file paths +
+  line numbers + counts), `suggestedFix` (one or two specific edits — not
+  prose).
+- Set `source: "agent"` so triage knows where it came from.
+- Cite at least three observed instances in `evidence` so the triage agent
+  can verify without re-reading every log.
+
+If you find nothing pattern-worthy, file zero. That is a successful run —
+silence is the correct output when the system is healthy.
+
+## 4. NEVER auto-apply
+
+Do not edit code, prompts, templates, or docs. The triage flow
+(`triage-self-feedback`, `requirement-promoter`) decides what becomes work.
+Your job ends at filing. Even if the fix looks one-line obvious — file it
+and stop.
+
+## 5. NEVER ship a flood
+
+Three is a hard cap. If you find a fourth, hold it for the next run. The
+slow-pace constraint is deliberate — the triage flow is a human-paced
+queue, not a firehose intake.
+
+</process>
+
+<output>
+A short report:
+
+- snapshot numbers (entries scanned, days covered)
+- patterns considered + which ones met the 3-occurrence bar
+- entry IDs filed (with `sf_self_report`'s returned id), or "none filed"
+  when the system is healthy
+- one sentence on what trend you'd watch next run
+</output>
+
+<scheduling_hint>
+This template pairs well with `/schedule every morning sf-audit-traces`.
+Daily cadence + the 3-entry cap means the triage queue grows by at most 21
+entries per week, which a human triage pass can clear in one sitting.
+</scheduling_hint>