feat(sf): sf-audit-traces workflow for slow self-improvement loop

A standalone agent prompt that reads SF's observability sources
(self-feedback / journal / activity / judgments / forensics) and
files AT MOST 3 recurring-pattern findings via sf_self_report so
they enter the existing triage flow.

PDD spec:

Purpose: continuous self-improvement loop. SF already has the data
  sources (self-feedback.jsonl, journal/, activity/, judgments/) and
  the consumer pattern (triage-self-feedback → requirement-promoter).
  What was missing: a standalone prompt that pulls those sources
  together for a scheduled run.
Consumer: agents invoked via '/schedule every morning sf-audit-traces'
  (cloud) or '/sf workflow run sf-audit-traces' (manual).
Contract:
  1. Snapshot the trace volumes (file counts + line counts) into
     evidence so reports are concrete, not prose.
  2. Bar = 3+ occurrences. Single events go to operator eyeballs,
     not permanent self-feedback entries.
  3. Hard cap of 3 entries per run. The whole point is slow
     iteration — the triage queue is human-paced, not a firehose.
  4. NEVER auto-apply. Even if the fix looks one-line obvious, file
     and stop. The triage flow decides what becomes work.
  5. Zero findings is a successful run when the system is healthy.
Failure boundary: missing source files → skip silently. Read errors
  → handle gracefully. Never block on absence.
Evidence (verified during scan before writing):
  - 181 self-feedback entries (55 open, 126 resolved)
  - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure
    (2), context-injection-gap (2), orphan-prompt (2)
  - Journal: 6-233 events per active day
  - Activity logs: per-unit JSONL transcripts present
  - All sources accessible via plain file reads — no special tools.
Non-goals:
  - ML training on traces
  - Cross-project trace aggregation
  - Auto-applying fixes (triage flow already does that)
  - Fast iteration (deliberately slow — 3/run cap means at most 21
    new triage items per week even with daily runs)
Invariants:
  - Safety: agent never edits code/prompts/templates/docs.
  - Liveness: zero findings is a valid output. The agent doesn't
    fabricate patterns to justify a run.

Discovery verified: 28 total workflow templates after this commit
(was 27); plugins.get('sf-audit-traces') returns the plugin from
the bundled source.

Pairs with: triage-self-feedback (reads what this files),
requirement-promoter (auto-promotes recurring kinds to requirements),
self-feedback-drain (session-start drain into repair turns). The
audit is the IN end of that pipeline; the rest of SF was already
the OUT end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-02 21:15:13 +02:00
parent e381e3c8ad
commit 82633b6f5e
2 changed files with 115 additions and 0 deletions

View file

@ -373,6 +373,16 @@
"artifact_dir": null,
"estimated_complexity": "medium",
"requires_project": false
},
"sf-audit-traces": {
"name": "SF Audit Traces",
"description": "Read SF's observability sources (self-feedback, journal, activity, judgments) and file at most 3 recurring-pattern findings via sf_self_report — designed for /schedule daily cadence",
"file": "sf-audit-traces.md",
"phases": ["audit"],
"triggers": ["sf audit traces", "audit sf", "self improve", "scan sf logs", "improve sf"],
"artifact_dir": null,
"estimated_complexity": "low",
"requires_project": true
}
}
}

View file

@ -0,0 +1,105 @@
# SF Audit Traces
<template_meta>
name: sf-audit-traces
version: 1
mode: oneshot
requires_project: true
artifact_dir: null
</template_meta>
<purpose>
Read SF's own observability sources, identify ONE non-obvious recurring pattern,
and file it as self-feedback so the existing triage flow can promote it.
Iterate slowly: at most three entries per run. The point isn't to flood the
queue — the point is to catch what no single session noticed.
</purpose>
<inputs>
- `.sf/SELF-FEEDBACK.md` — markdown view of filed anomalies
- `.sf/self-feedback.jsonl` — durable source of truth
- `.sf/journal/YYYY-MM-DD.jsonl` — per-day dispatch + iteration events
- `.sf/activity/{seq}-{type}-{id}.jsonl` — per-unit transcript
- `.sf/judgments/*.jsonl` — recorded agent decisions (when present)
- `.sf/forensics/*.json` — saved post-mortems (when present)
</inputs>
<process>
## 1. Snapshot
Run these to anchor the scan in real numbers — file paths and counts go
into the eventual self-feedback evidence:
```bash
wc -l .sf/self-feedback.jsonl 2>/dev/null
ls .sf/journal/ 2>/dev/null | tail -7
ls .sf/activity/ 2>/dev/null | wc -l
```
Read the latest 7 days of journal files plus the last 30 activity files. If
a source is missing, skip silently — never block on absence.
## 2. Look for recurring patterns
The bar is **3+ occurrences** across the data, not single events. Examples:
- The same `kind` filed by self-feedback 3+ times in a week
- The same dispatch rule firing then immediately being un-applied (paired
events in the journal)
- The same tool error repeating across activity logs
- The same run-away-guard pause across multiple units
- The same auto-resolved entry kind triaged as `wontfix` repeatedly (signal
the detector is too noisy)
- A judgment that proves wrong over multiple subsequent units
Single events go to the operator's eyeballs, not to a permanent self-feedback
entry. Patterns earn one.
## 3. File at most three findings
For each pattern:
- One call to `sf_self_report` with `kind` (slug, hyphenated), `severity`
(`low`/`medium`/`high`/`critical` — almost always `medium`), `summary`
(one sentence naming the pattern), `evidence` (concrete file paths +
line numbers + counts), `suggestedFix` (one or two specific edits — not
prose).
- Set `source: "agent"` so triage knows where it came from.
- Cite at least three observed instances in `evidence` so the triage agent
can verify without re-reading every log.
If you find nothing pattern-worthy, file zero. That is a successful run —
silence is the correct output when the system is healthy.
## 4. NEVER auto-apply
Do not edit code, prompts, templates, or docs. The triage flow
(`triage-self-feedback`, `requirement-promoter`) decides what becomes work.
Your job ends at filing. Even if the fix looks one-line obvious — file it
and stop.
## 5. NEVER ship a flood
Three is a hard cap. If you find a fourth, hold it for the next run. The
slow-pace constraint is deliberate — the triage flow is a human-paced
queue, not a firehose intake.
</process>
<output>
A short report:
- snapshot numbers (entries scanned, days covered)
- patterns considered + which ones met the 3-occurrence bar
- entry IDs filed (with `sf_self_report`'s returned id), or "none filed"
when the system is healthy
- one sentence on what trend you'd watch next run
</output>
<scheduling_hint>
This template pairs well with `/schedule every morning sf-audit-traces`.
Daily cadence + the 3-entry cap means the triage queue grows by at most 21
entries per week, which a human triage pass can clear in one sitting.
</scheduling_hint>