TODO: triage should escalate Tier 1 items to real milestones

Today's triage run confirmed the manual `/todo triage` workflow works, but it stops at tier-listing items in BUILD_PLAN.md — doesn't scaffold .sf/milestones/MNNN/ dirs for the Tier 1 ones. That's the gap that needs closing for the autonomous flow to actually create milestones from raw TODO dumps. Also captures the non-fatal phases-helpers.js extension load error that appeared at the top of the triage run output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:15:33 +02:00 · 2026-05-11 19:15:33 +02:00 · 3e652a9fd6
commit 3e652a9fd6
parent ca7368e5f1
1 changed files with 44 additions and 94 deletions
--- a/TODO.md
+++ b/TODO.md
@ -1,109 +1,59 @@
 # TODO

-Dump anything here. `/todo triage` (and eventually the autonomous loop) processes it into proper backlog / harness / evals / docs.
+Dump anything here.

 ---

-## Auto-triage TODO.md on each autonomous cycle
+## Triage should auto-promote Tier 1 items into real milestones

-`commands-todo.js` already implements the triage (`/todo triage` →
-`triageTodoDump`). Today it's manual only. Wire it to the autonomous
-orchestrator so each cycle starts by checking if TODO.md has content
-beyond the empty template, and if so runs `triageTodoDump` before
-picking the next unit.
+Confirmed today: `/todo triage` works — it produces a plan doc and tiers
+items into `BUILD_PLAN.md`. **But it stops at tiering.** Tier 1 items —
+the ones marked blocking / correctness — should additionally trigger
+the milestone-creation path so they land as `.sf/milestones/MNNN/`
+scaffolds, not just tier-list rows that have to be promoted by hand.

-Triage cost is one LLM call (Minimax M2.7 etc per `PREFERRED_TRIAGE_MODEL_PATTERNS`),
-which is cheap relative to a cycle. Skip when TODO.md == `_EMPTY_TODO`
-template so cycles aren't penalised when there's nothing new.
+Concretely: at the end of `triageTodoDump()` in
+`src/resources/extensions/sf/commands-todo.js`, after items are
+inserted into `triage_runs`/`triage_items` and reflected in
+`BUILD_PLAN.md`, iterate items where `tier === "T1"` (or some configured
+threshold) and call into the same milestone-creation flow that
+`/sf new-milestone` uses. Each Tier-1 triage item's context (the
+already-LLM-rationalised "Why" + "Implementation note") becomes the
+spec for one milestone.

-Test data: this file. When the wire lands, this section gets converted
-to a real backlog item automatically and the file resets to the empty
-template.
+Two design decisions to make:

-## When SF needs to ask a question, resolve via probes instead
+1. **Auto-create vs queue.** Safer default is "queue, await human
+   approval before scaffolding files" — so triage emits a small
+   `triage_milestone_candidates` row and a follow-up command
+   (`/sf triage-milestones approve <id>`) does the actual scaffold.
+   More aggressive: skip the human gate when there's high confidence
+   from the triage LLM that the item is well-scoped.

-Today: agent inside `headless new-milestone` calls
-`ask_user_questions`. In headless / autonomous mode that surface stalls
-or returns "tool unavailable", and the milestone never lands.
+2. **What counts as Tier 1.** The triage LLM already decides this
+   based on the spec wording. Document the criteria so it's stable
+   across model swaps (e.g. "correctness / blocking / safety = T1;
+   ergonomics / nice-to-have = T2-3").

-Wanted behaviour: replace blocking user questions with **adversarial-
-collaborative resolution**:
+When this ships, today's seven triage items (recorded in
+`docs/plans/todo-triage-2026-05-11-plan.md`) should retroactively get
+milestones for the two Tier 1 entries (headless unattended fix +
+adversarial-collaborative probes).

- **Combatant probe** — adversarial agent challenges the assumption
-  behind the question. "Why do you need this answer? What if it's the
-  opposite? What evidence would change your mind?"
- **Partner probe** — collaborative agent does targeted research to
-  surface the most likely answer from the codebase / existing context /
-  prior milestones.
- Both run in parallel, with a short budget (e.g. 30 s / 2 tool calls
-  each).
- If they converge → proceed with the resolved answer, note the
-  decision and confidence in the milestone artifact.
- If they diverge AND no human is reachable → make the conservative
-  call (minimal scope) and flag the unresolved question in
-  `OPEN-QUESTIONS.md` for later human review.
- Only fall back to actually asking the human if interactive mode is
-  available and the question is high-stakes.
+## Triage runner had an extension load error

-This makes `headless new-milestone --context …` actually finish
-unattended, which is the gap that's blocking batch backlog ingestion
-right now.
+Today's `/todo triage` run printed at the top of its output:

-## Headless `new-milestone` is broken in unattended mode
+```
+[sf] Extension load error Error: Failed to load extension
+"/home/mhugo/.sf/agent/extensions/sf/index.js":
+The requested module './phases-helpers.js' does not provide an
+export named 'closeoutAndStop'
+```

-Reproduce: `sf headless new-milestone --context-text "…complete spec…"`
-→ agent invokes `ask_user_questions` → "tool unavailable" error → no
-milestone created.
-
-Two paths to fix (either works, both ideal):
-
-1. Prompt-level: instruct the agent that when `--context` or
-   `--context-text` is provided, that's the complete spec, and to
-   proceed without follow-up questions. Cheap (prompt change).
-2. Tool-level: in headless mode without `--supervised`, have
-   `ask_user_questions` resolve through the combatant/partner probe
-   flow described above rather than failing.
-
-## Bulk roadmap import
-
-`sf headless import-roadmap --file BACKLOG.md` — read flat markdown
-with H2 sections and bullet items, emit one milestone per H2, slices
-per item, no LLM. Pure text → SF-structure transform.
-
-Useful for ingesting `BACKLOG.md` from `.sf/wiki/` (or from a human's
-roadmap file) without 16 LLM round-trips.
-
-Schema: H2 = milestone title. Following paragraph = milestone context.
-Each `- ⬜` bullet = one slice (`✅` filters out done items). Optional
-H3 = phase boundary inside the milestone.
-
-## `sf plan list` should have a TTY-free variant
-
-`sf plan list` fails with "Interactive mode requires a terminal" in
-non-TTY. The actual operation (list files in `.sf/milestones/`) needs
-no interaction. `sf plan list --plain` or `sf headless plan list`
-that emits one milestone-id-and-title per line would be enough.
-
-## Hand-authorable milestone scaffold
-
-Today a milestone is a directory tree with `CONTEXT.md`,
-`MILESTONE-SUMMARY.md`, `ROADMAP.md`, `SUMMARY.md`, plus `slices/SNN/`
-and `tasks/TNN/`. Naming uses an ID + 6-char hash that's not documented.
-
-A documented "minimum milestone" — say, just `CONTEXT.md` with
-frontmatter `id: MNNN\ntitle: …` — that SF will accept and auto-fill
-the rest of the tree from on first operation. Lets humans (or other
-tools) hand-author milestones when SF's LLM scaffold is unavailable or
-overkill.
-
-## Discoverable `--answers` schema
-
-`sf headless` has `--answers <path>` for pre-supplying interactive
-answers, but the answer schema for each command isn't discoverable.
-
-`sf headless new-milestone --print-answer-schema` that emits the JSON
-schema of every question the command *might* ask, so a caller can
-pre-supply rather than running interactively first to record them.
-Complements the probe-resolution flow above — if probes converge,
-use that; if they diverge but the caller pre-supplied an answer via
-`--answers`, use that instead of falling back to OPEN-QUESTIONS.md.
+Triage itself still completed successfully, so it's non-fatal — but
+some extension in the agent harness expected `closeoutAndStop` from
+`phases-helpers.js` and didn't find it. Either the symbol was renamed
+and a caller wasn't updated, or `npm run copy-resources` didn't sync
+the right file. Likely caught by a test if there's one for the
+phases-helpers exports list.