feat(sf): sf-audit-traces workflow for slow self-improvement loop
A standalone agent prompt that reads SF's observability sources
(self-feedback / journal / activity / judgments / forensics) and
files AT MOST 3 recurring-pattern findings via sf_self_report so
they enter the existing triage flow.
PDD spec:
Purpose: continuous self-improvement loop. SF already has the data
sources (self-feedback.jsonl, journal/, activity/, judgments/) and
the consumer pattern (triage-self-feedback → requirement-promoter).
What was missing: a standalone prompt that pulls those sources
together for a scheduled run.
Consumer: agents invoked via '/schedule every morning sf-audit-traces'
(cloud) or '/sf workflow run sf-audit-traces' (manual).
Contract:
1. Snapshot the trace volumes (file counts + line counts) into
evidence so reports are concrete, not prose.
2. Bar = 3+ occurrences. Single events go to operator eyeballs,
not permanent self-feedback entries.
3. Hard cap of 3 entries per run. The whole point is slow
iteration — the triage queue is human-paced, not a firehose.
4. NEVER auto-apply. Even if the fix looks one-line obvious, file
and stop. The triage flow decides what becomes work.
5. Zero findings is a successful run when the system is healthy.
Failure boundary: missing source files → skip silently. Read errors
→ handle gracefully. Never block on absence.
Evidence (verified during scan before writing):
- 181 self-feedback entries (55 open, 126 resolved)
- Top open kinds: runaway-guard-hard-pause (4), git-stage-failure
(2), context-injection-gap (2), orphan-prompt (2)
- Journal: 6-233 events per active day
- Activity logs: per-unit JSONL transcripts present
- All sources accessible via plain file reads — no special tools.
Non-goals:
- ML training on traces
- Cross-project trace aggregation
- Auto-applying fixes (triage flow already does that)
- Fast iteration (deliberately slow — 3/run cap means at most 21
new triage items per week even with daily runs)
Invariants:
- Safety: agent never edits code/prompts/templates/docs.
- Liveness: zero findings is a valid output. The agent doesn't
fabricate patterns to justify a run.
Discovery verified: 28 total workflow templates after this commit
(was 27); plugins.get('sf-audit-traces') returns the plugin from
the bundled source.
Pairs with: triage-self-feedback (reads what this files),
requirement-promoter (auto-promotes recurring kinds to requirements),
self-feedback-drain (session-start drain into repair turns). The
audit is the IN end of that pipeline; the rest of SF was already
the OUT end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e381e3c8ad
commit
82633b6f5e
2 changed files with 115 additions and 0 deletions
|
|
@ -373,6 +373,16 @@
|
|||
"artifact_dir": null,
|
||||
"estimated_complexity": "medium",
|
||||
"requires_project": false
|
||||
},
|
||||
"sf-audit-traces": {
|
||||
"name": "SF Audit Traces",
|
||||
"description": "Read SF's observability sources (self-feedback, journal, activity, judgments) and file at most 3 recurring-pattern findings via sf_self_report — designed for /schedule daily cadence",
|
||||
"file": "sf-audit-traces.md",
|
||||
"phases": ["audit"],
|
||||
"triggers": ["sf audit traces", "audit sf", "self improve", "scan sf logs", "improve sf"],
|
||||
"artifact_dir": null,
|
||||
"estimated_complexity": "low",
|
||||
"requires_project": true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -0,0 +1,105 @@
|
|||
# SF Audit Traces
|
||||
|
||||
<template_meta>
|
||||
name: sf-audit-traces
|
||||
version: 1
|
||||
mode: oneshot
|
||||
requires_project: true
|
||||
artifact_dir: null
|
||||
</template_meta>
|
||||
|
||||
<purpose>
|
||||
Read SF's own observability sources, identify ONE non-obvious recurring pattern,
|
||||
and file it as self-feedback so the existing triage flow can promote it.
|
||||
|
||||
Iterate slowly: at most three entries per run. The point isn't to flood the
|
||||
queue — the point is to catch what no single session noticed.
|
||||
</purpose>
|
||||
|
||||
<inputs>
|
||||
- `.sf/SELF-FEEDBACK.md` — markdown view of filed anomalies
|
||||
- `.sf/self-feedback.jsonl` — durable source of truth
|
||||
- `.sf/journal/YYYY-MM-DD.jsonl` — per-day dispatch + iteration events
|
||||
- `.sf/activity/{seq}-{type}-{id}.jsonl` — per-unit transcript
|
||||
- `.sf/judgments/*.jsonl` — recorded agent decisions (when present)
|
||||
- `.sf/forensics/*.json` — saved post-mortems (when present)
|
||||
</inputs>
|
||||
|
||||
<process>
|
||||
|
||||
## 1. Snapshot
|
||||
|
||||
Run these to anchor the scan in real numbers — file paths and counts go
|
||||
into the eventual self-feedback evidence:
|
||||
|
||||
```bash
|
||||
wc -l .sf/self-feedback.jsonl 2>/dev/null
|
||||
ls .sf/journal/ 2>/dev/null | tail -7
|
||||
ls .sf/activity/ 2>/dev/null | wc -l
|
||||
```
|
||||
|
||||
Read the latest 7 days of journal files plus the last 30 activity files. If
|
||||
a source is missing, skip silently — never block on absence.
|
||||
|
||||
## 2. Look for recurring patterns
|
||||
|
||||
The bar is **3+ occurrences** across the data, not single events. Examples:
|
||||
|
||||
- The same `kind` filed by self-feedback 3+ times in a week
|
||||
- The same dispatch rule firing then immediately being un-applied (paired
|
||||
events in the journal)
|
||||
- The same tool error repeating across activity logs
|
||||
- The same run-away-guard pause across multiple units
|
||||
- The same auto-resolved entry kind triaged as `wontfix` repeatedly (signal
|
||||
the detector is too noisy)
|
||||
- A judgment that proves wrong over multiple subsequent units
|
||||
|
||||
Single events go to the operator's eyeballs, not to a permanent self-feedback
|
||||
entry. Patterns earn one.
|
||||
|
||||
## 3. File at most three findings
|
||||
|
||||
For each pattern:
|
||||
|
||||
- One call to `sf_self_report` with `kind` (slug, hyphenated), `severity`
|
||||
(`low`/`medium`/`high`/`critical` — almost always `medium`), `summary`
|
||||
(one sentence naming the pattern), `evidence` (concrete file paths +
|
||||
line numbers + counts), `suggestedFix` (one or two specific edits — not
|
||||
prose).
|
||||
- Set `source: "agent"` so triage knows where it came from.
|
||||
- Cite at least three observed instances in `evidence` so the triage agent
|
||||
can verify without re-reading every log.
|
||||
|
||||
If you find nothing pattern-worthy, file zero. That is a successful run —
|
||||
silence is the correct output when the system is healthy.
|
||||
|
||||
## 4. NEVER auto-apply
|
||||
|
||||
Do not edit code, prompts, templates, or docs. The triage flow
|
||||
(`triage-self-feedback`, `requirement-promoter`) decides what becomes work.
|
||||
Your job ends at filing. Even if the fix looks one-line obvious — file it
|
||||
and stop.
|
||||
|
||||
## 5. NEVER ship a flood
|
||||
|
||||
Three is a hard cap. If you find a fourth, hold it for the next run. The
|
||||
slow-pace constraint is deliberate — the triage flow is a human-paced
|
||||
queue, not a firehose intake.
|
||||
|
||||
</process>
|
||||
|
||||
<output>
|
||||
A short report:
|
||||
|
||||
- snapshot numbers (entries scanned, days covered)
|
||||
- patterns considered + which ones met the 3-occurrence bar
|
||||
- entry IDs filed (with `sf_self_report`'s returned id), or "none filed"
|
||||
when the system is healthy
|
||||
- one sentence on what trend you'd watch next run
|
||||
</output>
|
||||
|
||||
<scheduling_hint>
|
||||
This template pairs well with `/schedule every morning sf-audit-traces`.
|
||||
Daily cadence + the 3-entry cap means the triage queue grows by at most 21
|
||||
entries per week, which a human triage pass can clear in one sitting.
|
||||
</scheduling_hint>
|
||||
Loading…
Add table
Reference in a new issue