From 7ef58422b1b5e98467428aca0ea56f634ead3bb2 Mon Sep 17 00:00:00 2001 From: Mikael Hugo Date: Mon, 11 May 2026 19:09:26 +0200 Subject: [PATCH] TODO: feature requests for batch backlog ingestion + probe-based resolution Real dogfood for the auto-triage feature: this is the unstructured dump that the autonomous cycle should pick up and process into proper backlog items the next time it runs. Until auto-triage is wired up, the contents serve as a written spec for what's needed. Two flagship features: - Auto-triage TODO.md on each autonomous cycle. `commands-todo.js` already implements `/todo triage` (manual). Wire it to the autonomous orchestrator and skip when TODO.md == _EMPTY_TODO. - When the LLM would ask a clarifying question, replace with parallel combatant + partner probes (adversarial-challenge + collaborative- research) and only fall back to asking a human if probes diverge AND interactive mode is available. This unblocks unattended `headless new-milestone` (the gap that blocked batch backlog ingestion today). Plus five smaller items (headless milestone stall fix, bulk import-roadmap, TTY-free plan list, hand-authorable milestone scaffold, discoverable --answers schema) carried over from the centralcloud-ops SF-IMPROVEMENTS.md observations. Co-Authored-By: Claude Opus 4.7 (1M context) --- TODO.md | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 TODO.md diff --git a/TODO.md b/TODO.md new file mode 100644 index 000000000..4bc590508 --- /dev/null +++ b/TODO.md @@ -0,0 +1,109 @@ +# TODO + +Dump anything here. `/todo triage` (and eventually the autonomous loop) processes it into proper backlog / harness / evals / docs. + +--- + +## Auto-triage TODO.md on each autonomous cycle + +`commands-todo.js` already implements the triage (`/todo triage` → +`triageTodoDump`). Today it's manual only. Wire it to the autonomous +orchestrator so each cycle starts by checking if TODO.md has content +beyond the empty template, and if so runs `triageTodoDump` before +picking the next unit. + +Triage cost is one LLM call (Minimax M2.7 etc per `PREFERRED_TRIAGE_MODEL_PATTERNS`), +which is cheap relative to a cycle. Skip when TODO.md == `_EMPTY_TODO` +template so cycles aren't penalised when there's nothing new. + +Test data: this file. When the wire lands, this section gets converted +to a real backlog item automatically and the file resets to the empty +template. + +## When SF needs to ask a question, resolve via probes instead + +Today: agent inside `headless new-milestone` calls +`ask_user_questions`. In headless / autonomous mode that surface stalls +or returns "tool unavailable", and the milestone never lands. + +Wanted behaviour: replace blocking user questions with **adversarial- +collaborative resolution**: + +- **Combatant probe** — adversarial agent challenges the assumption + behind the question. "Why do you need this answer? What if it's the + opposite? What evidence would change your mind?" +- **Partner probe** — collaborative agent does targeted research to + surface the most likely answer from the codebase / existing context / + prior milestones. +- Both run in parallel, with a short budget (e.g. 30 s / 2 tool calls + each). +- If they converge → proceed with the resolved answer, note the + decision and confidence in the milestone artifact. +- If they diverge AND no human is reachable → make the conservative + call (minimal scope) and flag the unresolved question in + `OPEN-QUESTIONS.md` for later human review. +- Only fall back to actually asking the human if interactive mode is + available and the question is high-stakes. + +This makes `headless new-milestone --context …` actually finish +unattended, which is the gap that's blocking batch backlog ingestion +right now. + +## Headless `new-milestone` is broken in unattended mode + +Reproduce: `sf headless new-milestone --context-text "…complete spec…"` +→ agent invokes `ask_user_questions` → "tool unavailable" error → no +milestone created. + +Two paths to fix (either works, both ideal): + +1. Prompt-level: instruct the agent that when `--context` or + `--context-text` is provided, that's the complete spec, and to + proceed without follow-up questions. Cheap (prompt change). +2. Tool-level: in headless mode without `--supervised`, have + `ask_user_questions` resolve through the combatant/partner probe + flow described above rather than failing. + +## Bulk roadmap import + +`sf headless import-roadmap --file BACKLOG.md` — read flat markdown +with H2 sections and bullet items, emit one milestone per H2, slices +per item, no LLM. Pure text → SF-structure transform. + +Useful for ingesting `BACKLOG.md` from `.sf/wiki/` (or from a human's +roadmap file) without 16 LLM round-trips. + +Schema: H2 = milestone title. Following paragraph = milestone context. +Each `- ⬜` bullet = one slice (`✅` filters out done items). Optional +H3 = phase boundary inside the milestone. + +## `sf plan list` should have a TTY-free variant + +`sf plan list` fails with "Interactive mode requires a terminal" in +non-TTY. The actual operation (list files in `.sf/milestones/`) needs +no interaction. `sf plan list --plain` or `sf headless plan list` +that emits one milestone-id-and-title per line would be enough. + +## Hand-authorable milestone scaffold + +Today a milestone is a directory tree with `CONTEXT.md`, +`MILESTONE-SUMMARY.md`, `ROADMAP.md`, `SUMMARY.md`, plus `slices/SNN/` +and `tasks/TNN/`. Naming uses an ID + 6-char hash that's not documented. + +A documented "minimum milestone" — say, just `CONTEXT.md` with +frontmatter `id: MNNN\ntitle: …` — that SF will accept and auto-fill +the rest of the tree from on first operation. Lets humans (or other +tools) hand-author milestones when SF's LLM scaffold is unavailable or +overkill. + +## Discoverable `--answers` schema + +`sf headless` has `--answers ` for pre-supplying interactive +answers, but the answer schema for each command isn't discoverable. + +`sf headless new-milestone --print-answer-schema` that emits the JSON +schema of every question the command *might* ask, so a caller can +pre-supply rather than running interactively first to record them. +Complements the probe-resolution flow above — if probes converge, +use that; if they diverge but the caller pre-supplied an answer via +`--answers`, use that instead of falling back to OPEN-QUESTIONS.md.