Add TODO triage and validation recheck flow

This commit is contained in:
Mikael Hugo 2026-04-30 08:41:49 +02:00
parent ed19fa1864
commit 8487507d1b
6 changed files with 834 additions and 21 deletions

142
TODO.md Normal file
View file

@ -0,0 +1,142 @@
# TODO
Dump anything here.
SF agentic engineering / harness / memory / eval context dump:
We want a low-friction dump inbox that turns rough human notes into project
evals, harness work, memory requirements, docs, tests, or implementation tasks.
Root TODO.md is the dump place. AGENTS.md carries the durable instruction:
agents should read TODO.md when present, triage it, and clear processed notes
after converting them into reviewable artifacts.
Important split:
- AGENTS.md = durable startup-visible instructions.
- TODO.md = messy temporary dump inbox.
- Memory = experience store.
- GEPA/DSPy/self-evolution = offline lab.
- Runtime agent = uses approved skills/prompts/tools/memory, not unreviewed
evolved candidates.
Harness.io note:
- Harness Agents are AI workers inside Harness CI/CD pipelines.
- They inherit pipeline context, secrets, RBAC, approvals, logs, and OPA policy.
- Useful SF lesson: run agents inside a governed workflow with permissions,
logs, approvals, artifacts, reusable templates, and reviewable outputs.
- This is different from repo-native test/eval harnesses, but the control-plane
pattern is valuable.
Current SF state:
- Auto-mode safety harness exists and is default-on: evidence collection,
file-change validation, evidence cross-reference, destructive command
warnings, content validation, checkpoints. Auto rollback is off by default.
- gate-evaluate exists but is opt-in via gate_evaluation.enabled.
- Repo-native harness evolution is mostly read-only/proposed today:
/sf harness profile records repo facts in .sf/sf.db, but does not yet enforce
harness/manifest gates or write harness/, gates/, eval suites, or CI files.
Slow conversion of TS into fast agents:
- Do not rewrite the deterministic SF state machine into LLM behavior.
- Keep TypeScript for CLI, TUI, extension API, preferences, state machine, DB
schema, safety gates, prompt rendering, workflow orchestration, and file
ownership rules.
- Convert fuzzy/read-only work into narrow agents: repo profiling
interpretation, TODO triage, eval generation, harness proposal, failure
analysis, review, remediation proposals, memory extraction, drift detection.
- SF remains the orchestrator and ledger. Agents consume typed jobs and return
structured JSON.
Possible AgentJob shape:
type AgentJob =
| { kind: "repo_profile"; cwd: string }
| { kind: "todo_triage"; cwd: string; todoPath: string }
| { kind: "eval_candidate_generation"; cwd: string; sources: string[] }
| { kind: "failure_analysis"; cwd: string; runId: string }
| { kind: "harness_proposal"; cwd: string; profileId: string };
First useful agents:
- TODO triage agent: reads TODO.md, creates eval candidates, implementation
tasks, memory facts, docs/harness suggestions, then clears processed notes.
- Eval candidate agent: converts notes/session failures into JSONL with
task_input, expected_behavior, failure_mode, evidence, source.
- Repo profile interpretation agent: uses deterministic TS repo-profiler output
and identifies missing gates/evals/docs.
- Harness proposal agent: produces dry-run proposals only; no tracked file
writes except reviewed artifacts later.
- Remediation agent: later, after evals are stable, takes failing evals and
proposes code/test patches.
Speed strategy:
- Deterministic TS: scan files, parse manifests, read git state, write DB rows.
- Cheap/local model agents: classify dump notes, summarize failures, label risk.
- Strong model agents: propose harnesses, generate eval rubrics, repair complex
failures.
Desired pipeline:
TODO.md dump -> triage agent -> eval candidate JSONL / backlog / docs / tests
-> reviewed project artifact -> eval suite / harness gate -> self-evolution
can consume later.
Potential eval candidate JSONL shape:
{
"id": "sf.todo-triage.001",
"task_input": "...",
"expected_behavior": "...",
"failure_mode": "...",
"evidence": "...",
"source": "TODO.md"
}
Self-evolution principle:
- Repeated failure -> add eval first, then fix behavior.
- Raw memory/dump notes are evidence, not approved behavior.
- GEPA/DSPy output must become reviewable diffs against skills/prompts/tool
descriptions and pass held-out evals plus deterministic gates.
GEPA/DSPy placement across SF vs memory/brain:
- GEPA/DSPy should not run inside normal SF runtime turns and should not live
as direct mutable memory behavior.
- SF owns the project workflow control plane: TODO triage, backlog handoff,
eval artifacts, harness proposals, deterministic gates, reviewed diffs, and
dispatch rules.
- Memory/brain owns durable experience: session traces, user corrections,
repeated failures, successful patterns, evidence IDs, source sessions, and
recall/export APIs.
- Memory/brain should expose dataset export surfaces for SF/self-evolution:
"give me candidate eval cases for this repo/risk/skill/tool from past
evidence".
- GEPA/DSPy consumes approved eval datasets and memory-exported candidates
offline, proposes prompt/skill/tool-description diffs, and hands those diffs
back to SF as reviewable implementation work.
- Accepted GEPA outputs become tracked repo artifacts or versioned SF resources,
not raw memory entries.
- Future home should be an offline evolution runner, either a separate repo
such as `singularity-evolution` or a clearly isolated SF package/command such
as `packages/evolution` plus `/sf evolve ...`. It should read
`.sf/triage/evals/*.evals.jsonl`, approved harness evals, and memory-exported
eval candidates; run DSPy/GEPA; then write candidate diffs/reports under
`.sf/evolution/` or a review branch. It must not mutate live prompts,
skills, memory, or tool descriptions directly.
Proper info flow:
- Raw human dump: root TODO.md.
- Raw agent self-report: .sf/BACKLOG.md and ~/.sf/agent/upstream-feedback.jsonl.
- Raw session-derived evidence: Singularity Memory / brain.
- First normalizer: /sf todo triage for TODO.md now; future /sf inbox triage
should normalize TODO.md + self-feedback + memory exports through the same
schema.
- Normalized pending items live in .sf/triage/inbox/*.jsonl with source, kind,
evidence, status, and created_at.
- Human-readable triage reports live in .sf/triage/reports/*.md.
- Eval-ready cases live in .sf/triage/evals/*.evals.jsonl.
- Human/planner-visible implementation tasks may be copied into .sf/BACKLOG.md
with /sf todo triage --backlog, but auto-mode must not execute backlog
directly. Planning/reassessment proposes promotion; user or explicit command
approves promotion into roadmap/slice/task artifacts.
- Memory-worthy notes are retained by memory/brain only after triage attaches
evidence/source; raw TODO notes are not memory.
- Preferred triage model tier: MiniMax M2.7 highspeed when available, then
MiniMax M2.5 highspeed, then other cheap/fast classification models. Triage
is structuring/classification, not final code editing.

View file

@ -278,6 +278,52 @@ function validationAttentionMarkerPath(basePath: string, mid: string): string {
);
}
function parseValidationRemediationRound(content: string): number | null {
const match = content.match(/^remediation_round:\s*(\d+)\s*$/m);
if (!match) return null;
const round = Number.parseInt(match[1]!, 10);
return Number.isFinite(round) ? round : null;
}
interface ValidationAttentionMarker {
milestoneId?: string;
createdAt?: string;
source?: string;
remediationRound?: number | null;
revalidationRound?: number;
revalidationRequestedAt?: string;
}
function readValidationAttentionMarker(
basePath: string,
mid: string,
): ValidationAttentionMarker | null {
const markerPath = validationAttentionMarkerPath(basePath, mid);
if (!existsSync(markerPath)) return null;
try {
const parsed = JSON.parse(readFileSync(markerPath, "utf-8")) as unknown;
if (!parsed || typeof parsed !== "object") return null;
return parsed as ValidationAttentionMarker;
} catch {
return null;
}
}
function writeValidationAttentionMarker(
basePath: string,
mid: string,
marker: ValidationAttentionMarker,
): void {
mkdirSync(join(sfRoot(basePath), "runtime", "validation-attention"), {
recursive: true,
});
writeFileSync(
validationAttentionMarkerPath(basePath, mid),
JSON.stringify(marker, null, 2) + "\n",
"utf-8",
);
}
function validationAttentionRuntimePath(basePath: string, mid: string): string {
return join(
sfRoot(basePath),
@ -301,6 +347,31 @@ function hasActiveValidationAttentionMarker(
return false;
}
function shouldDispatchValidationAttentionRevalidation(
basePath: string,
mid: string,
validationContent: string,
): boolean {
if (!hasActiveValidationAttentionMarker(basePath, mid)) return false;
const marker = readValidationAttentionMarker(basePath, mid);
if (marker?.milestoneId && marker.milestoneId !== mid) return false;
const currentRound = parseValidationRemediationRound(validationContent);
if (currentRound === null) return false;
const originalRound =
typeof marker?.remediationRound === "number" ? marker.remediationRound : -1;
if (currentRound <= originalRound) return false;
if (marker?.revalidationRound === currentRound) return false;
writeValidationAttentionMarker(basePath, mid, {
...marker,
milestoneId: mid,
revalidationRound: currentRound,
revalidationRequestedAt: new Date().toISOString(),
});
return true;
}
function buildValidationAttentionRemediationPrompt(
mid: string,
midTitle: string,
@ -1156,27 +1227,14 @@ export const DISPATCH_RULES: DispatchRule[] = [
attentionPlan &&
!hasActiveValidationAttentionMarker(basePath, mid)
) {
const markerPath = validationAttentionMarkerPath(basePath, mid);
try {
mkdirSync(
join(sfRoot(basePath), "runtime", "validation-attention"),
{
recursive: true,
},
);
writeFileSync(
markerPath,
JSON.stringify(
{
milestoneId: mid,
createdAt: new Date().toISOString(),
source: validationFile,
},
null,
2,
) + "\n",
"utf-8",
);
writeValidationAttentionMarker(basePath, mid, {
milestoneId: mid,
createdAt: new Date().toISOString(),
source: validationFile,
remediationRound:
parseValidationRemediationRound(validationContent),
});
} catch (err) {
logWarning(
"dispatch",
@ -1196,6 +1254,24 @@ export const DISPATCH_RULES: DispatchRule[] = [
),
};
}
if (
shouldDispatchValidationAttentionRevalidation(
basePath,
mid,
validationContent,
)
) {
return {
action: "dispatch",
unitType: "validate-milestone",
unitId: mid,
prompt: await buildValidateMilestonePrompt(
mid,
midTitle,
basePath,
),
};
}
}
return {
action: "stop",

View file

@ -0,0 +1,499 @@
/**
* commands-todo.ts - triage the repo-root TODO.md dump inbox.
*
* Purpose: turn low-friction human dumps into reviewable eval, harness, memory,
* docs, test, and implementation artifacts without treating raw notes as
* approved runtime behavior.
*
* Consumer: `/sf todo triage` command.
*/
import {
existsSync,
mkdirSync,
readFileSync,
writeFileSync,
} from "node:fs";
import { dirname, join } from "node:path";
import type {
ExtensionAPI,
ExtensionCommandContext,
} from "@singularity-forge/pi-coding-agent";
import type { Api, AssistantMessage, Model } from "@singularity-forge/pi-ai";
import { type LLMCallFn } from "./memory-extractor.js";
import { projectRoot } from "./commands/context.js";
import { sfRoot } from "./paths.js";
const EMPTY_TODO = "# TODO\n\nDump anything here.\n";
const MAX_DUMP_CHARS = 48_000;
const PREFERRED_TRIAGE_MODEL_PATTERNS = [
/minimax.*m2\.7.*highspeed/i,
/minimax.*m2\.5.*highspeed/i,
/minimax.*m2\.7/i,
/minimax.*m2\.5/i,
/haiku/i,
];
export interface TodoEvalCandidate {
id?: string;
task_input: string;
expected_behavior: string;
failure_mode?: string;
evidence?: string;
source?: string;
suggested_location?: string;
}
export interface TodoTriageResult {
summary: string;
eval_candidates: TodoEvalCandidate[];
implementation_tasks: string[];
memory_requirements: string[];
harness_suggestions: string[];
docs_or_tests: string[];
unclear_notes: string[];
}
interface NormalizedTriageItem {
id: string;
source: "todo.md";
kind:
| "eval_candidate"
| "implementation_task"
| "memory_requirement"
| "harness_suggestion"
| "docs_or_tests"
| "unclear_note";
content: string;
evidence?: string;
status: "pending";
created_at: string;
}
function timestampId(date = new Date()): string {
const pad = (n: number) => String(n).padStart(2, "0");
return [
date.getFullYear(),
pad(date.getMonth() + 1),
pad(date.getDate()),
"-",
pad(date.getHours()),
pad(date.getMinutes()),
pad(date.getSeconds()),
].join("");
}
function extractJsonObject(text: string): string {
const fenced = text.match(/```(?:json)?\s*([\s\S]*?)```/i);
if (fenced?.[1]?.trim()) return fenced[1].trim();
const first = text.indexOf("{");
const last = text.lastIndexOf("}");
if (first !== -1 && last > first) return text.slice(first, last + 1);
return text;
}
function stringArray(value: unknown): string[] {
if (!Array.isArray(value)) return [];
return value
.filter((item): item is string => typeof item === "string")
.map((item) => item.trim())
.filter(Boolean);
}
function evalCandidates(value: unknown): TodoEvalCandidate[] {
if (!Array.isArray(value)) return [];
return value
.filter((item): item is Record<string, unknown> => {
return (
typeof item === "object" &&
item !== null &&
typeof item.task_input === "string" &&
typeof item.expected_behavior === "string"
);
})
.map((item, idx) => ({
id:
typeof item.id === "string" && item.id.trim()
? item.id.trim()
: `todo.eval.${String(idx + 1).padStart(3, "0")}`,
task_input:
typeof item.task_input === "string" ? item.task_input.trim() : "",
expected_behavior:
typeof item.expected_behavior === "string"
? item.expected_behavior.trim()
: "",
failure_mode:
typeof item.failure_mode === "string"
? item.failure_mode.trim()
: undefined,
evidence:
typeof item.evidence === "string" ? item.evidence.trim() : undefined,
source: typeof item.source === "string" ? item.source.trim() : "TODO.md",
suggested_location:
typeof item.suggested_location === "string"
? item.suggested_location.trim()
: undefined,
}))
.filter((item) => item.task_input && item.expected_behavior);
}
export function parseTodoTriageResponse(response: string): TodoTriageResult {
const parsed = JSON.parse(extractJsonObject(response)) as Record<string, unknown>;
return {
summary:
typeof parsed.summary === "string" && parsed.summary.trim()
? parsed.summary.trim()
: "TODO dump triaged.",
eval_candidates: evalCandidates(parsed.eval_candidates),
implementation_tasks: stringArray(parsed.implementation_tasks),
memory_requirements: stringArray(parsed.memory_requirements),
harness_suggestions: stringArray(parsed.harness_suggestions),
docs_or_tests: stringArray(parsed.docs_or_tests),
unclear_notes: stringArray(parsed.unclear_notes),
};
}
export function extractTodoDump(rawTodo: string): string {
const lines = rawTodo.replace(/\r\n/g, "\n").split("\n");
const body = lines
.filter((line, idx) => {
if (idx === 0 && line.trim().toLowerCase() === "# todo") return false;
if (line.trim() === "Dump anything here.") return false;
return true;
})
.join("\n")
.trim();
return body;
}
function section(title: string, items: string[]): string {
if (items.length === 0) return `## ${title}\n\nNone.\n`;
return `## ${title}\n\n${items.map((item) => `- ${item}`).join("\n")}\n`;
}
export function renderTriageMarkdown(
result: TodoTriageResult,
sourcePath: string,
): string {
const evals =
result.eval_candidates.length === 0
? "None.\n"
: result.eval_candidates
.map((item) => {
const lines = [
`- ${item.id ?? "todo.eval"}`,
` - Trigger/input: ${item.task_input}`,
` - Expected behavior: ${item.expected_behavior}`,
];
if (item.failure_mode)
lines.push(` - Failure mode observed: ${item.failure_mode}`);
if (item.evidence) lines.push(` - Evidence/source: ${item.evidence}`);
if (item.suggested_location)
lines.push(` - Suggested location: ${item.suggested_location}`);
return lines.join("\n");
})
.join("\n\n") + "\n";
return [
"# TODO Triage",
"",
`Source: ${sourcePath}`,
`Generated: ${new Date().toISOString()}`,
"",
"## Summary",
"",
result.summary,
"",
"## Eval Candidates",
"",
evals,
section("Implementation Tasks", result.implementation_tasks),
section("Memory Requirements", result.memory_requirements),
section("Harness Suggestions", result.harness_suggestions),
section("Docs Or Tests", result.docs_or_tests),
section("Unclear Notes", result.unclear_notes),
].join("\n");
}
function renderEvalJsonl(result: TodoTriageResult): string {
return (
result.eval_candidates
.map((item) => JSON.stringify({ ...item, source: item.source ?? "TODO.md" }))
.join("\n") + (result.eval_candidates.length > 0 ? "\n" : "")
);
}
function backlogPath(basePath: string): string {
return join(sfRoot(basePath), "BACKLOG.md");
}
function nextBacklogId(content: string): string {
let maxNum = 0;
for (const match of content.matchAll(/^- \[[ x]\] 999\.(\d+) — /gm)) {
const num = Number.parseInt(match[1], 10);
if (Number.isFinite(num) && num > maxNum) maxNum = num;
}
return `999.${maxNum + 1}`;
}
function appendBacklogItems(basePath: string, titles: string[]): number {
const cleanTitles = titles.map((title) => title.trim()).filter(Boolean);
if (cleanTitles.length === 0) return 0;
const filePath = backlogPath(basePath);
mkdirSync(dirname(filePath), { recursive: true });
let content = existsSync(filePath)
? readFileSync(filePath, "utf-8")
: "# Backlog\n\n";
if (!content.endsWith("\n")) content += "\n";
const date = new Date().toISOString().slice(0, 10);
for (const title of cleanTitles) {
const id = nextBacklogId(content);
content += `- [ ] ${id}${title.replace(/^['"]|['"]$/g, "")} (triaged ${date})\n`;
}
writeFileSync(filePath, content, "utf-8");
return cleanTitles.length;
}
function normalizedItems(result: TodoTriageResult, createdAt: string): NormalizedTriageItem[] {
const items: NormalizedTriageItem[] = [];
let seq = 1;
const push = (
kind: NormalizedTriageItem["kind"],
content: string,
evidence?: string,
) => {
items.push({
id: `triage.${String(seq++).padStart(3, "0")}`,
source: "todo.md",
kind,
content,
evidence,
status: "pending",
created_at: createdAt,
});
};
for (const item of result.eval_candidates) {
push(
"eval_candidate",
`${item.task_input}\nExpected: ${item.expected_behavior}`,
item.evidence ?? item.failure_mode,
);
}
for (const item of result.implementation_tasks) push("implementation_task", item);
for (const item of result.memory_requirements) push("memory_requirement", item);
for (const item of result.harness_suggestions) push("harness_suggestion", item);
for (const item of result.docs_or_tests) push("docs_or_tests", item);
for (const item of result.unclear_notes) push("unclear_note", item);
return items;
}
function renderNormalizedJsonl(result: TodoTriageResult, createdAt: string): string {
const items = normalizedItems(result, createdAt);
return items.map((item) => JSON.stringify(item)).join("\n") + (items.length ? "\n" : "");
}
function buildTriagePrompt(dump: string): { system: string; user: string } {
return {
system: `You are a triage agent for a software engineering repository.
Convert a messy TODO.md dump into structured, reviewable project work.
Return ONLY valid JSON with this shape:
{
"summary": "short summary",
"eval_candidates": [
{
"id": "short stable id if obvious",
"task_input": "user/task input that should be evaluated",
"expected_behavior": "specific expected behavior",
"failure_mode": "observed failure or risk",
"evidence": "quote or short source note",
"source": "TODO.md",
"suggested_location": "suggested eval/test/harness path"
}
],
"implementation_tasks": ["concrete implementation task"],
"memory_requirements": ["memory extraction or retention requirement"],
"harness_suggestions": ["gate/eval/harness suggestion"],
"docs_or_tests": ["doc or test artifact to add/update"],
"unclear_notes": ["notes that need clarification"]
}
Rules:
- Preserve concrete details from the dump.
- Do not invent completed work.
- Raw dump notes are evidence, not approved runtime behavior.
- Repeated failures should become eval candidates before behavior changes.
- Prefer deterministic tests/gates when possible; use model judges only as advisory unless calibrated.`,
user: `Triage this repo-root TODO.md dump:\n\n<TODO_DUMP>\n${dump}\n</TODO_DUMP>`,
};
}
async function triageWithModel(
dump: string,
llmCall: LLMCallFn,
): Promise<TodoTriageResult> {
const prompt = buildTriagePrompt(dump.slice(0, MAX_DUMP_CHARS));
const response = await llmCall(prompt.system, prompt.user);
return parseTodoTriageResponse(response);
}
function chooseTodoTriageModel(ctx: ExtensionCommandContext): Model<Api> | null {
try {
const available = ctx.modelRegistry?.getAvailable?.() ?? [];
for (const pattern of PREFERRED_TRIAGE_MODEL_PATTERNS) {
const match = available.find((model: Model<Api>) => {
return (
pattern.test(`${model.provider}/${model.id}`) ||
pattern.test(model.name ?? "")
);
});
if (match) return match as Model<Api>;
}
return (ctx.model as Model<Api> | undefined) ?? (available[0] as Model<Api> | undefined) ?? null;
} catch {
return (ctx.model as Model<Api> | undefined) ?? null;
}
}
function buildTodoTriageLLMCall(ctx: ExtensionCommandContext): LLMCallFn | null {
const model = chooseTodoTriageModel(ctx);
if (!model) return null;
const resolvedKeyPromise = ctx.modelRegistry
?.getApiKey?.(model)
.catch(() => undefined);
return async (system: string, user: string): Promise<string> => {
const { completeSimple } = await import("@singularity-forge/pi-ai");
const resolvedApiKey = await resolvedKeyPromise;
const result: AssistantMessage = await completeSimple(
model,
{
systemPrompt: system,
messages: [
{
role: "user",
content: [{ type: "text", text: user }],
timestamp: Date.now(),
},
],
},
{
maxTokens: 4096,
temperature: 0,
...(resolvedApiKey ? { apiKey: resolvedApiKey } : {}),
},
);
return result.content
.filter((part): part is { type: "text"; text: string } => part.type === "text")
.map((part) => part.text)
.join("");
};
}
export async function triageTodoDump(
basePath: string,
llmCall: LLMCallFn,
options: { clear?: boolean; date?: Date; backlog?: boolean } = {},
): Promise<{
markdownPath: string;
evalJsonlPath: string;
normalizedJsonlPath: string;
backlogItemsAdded: number;
result: TodoTriageResult;
}> {
const todoPath = join(basePath, "TODO.md");
if (!existsSync(todoPath)) {
throw new Error("No root TODO.md found.");
}
const raw = readFileSync(todoPath, "utf-8");
const dump = extractTodoDump(raw);
if (!dump) {
throw new Error("TODO.md has no dump content to triage.");
}
const result = await triageWithModel(dump, llmCall);
const id = timestampId(options.date);
const createdAt = (options.date ?? new Date()).toISOString();
const triageRoot = join(basePath, ".sf", "triage");
const reportsDir = join(triageRoot, "reports");
const evalsDir = join(triageRoot, "evals");
const inboxDir = join(triageRoot, "inbox");
mkdirSync(reportsDir, { recursive: true });
mkdirSync(evalsDir, { recursive: true });
mkdirSync(inboxDir, { recursive: true });
const markdownPath = join(reportsDir, `${id}.md`);
const evalJsonlPath = join(evalsDir, `${id}.evals.jsonl`);
const normalizedJsonlPath = join(inboxDir, `${id}.jsonl`);
writeFileSync(markdownPath, renderTriageMarkdown(result, "TODO.md"));
writeFileSync(evalJsonlPath, renderEvalJsonl(result));
writeFileSync(normalizedJsonlPath, renderNormalizedJsonl(result, createdAt));
const backlogItemsAdded =
options.backlog === true
? appendBacklogItems(basePath, result.implementation_tasks)
: 0;
if (options.clear !== false) {
writeFileSync(todoPath, EMPTY_TODO);
}
return {
markdownPath,
evalJsonlPath,
normalizedJsonlPath,
backlogItemsAdded,
result,
};
}
export async function handleTodo(
args: string,
ctx: ExtensionCommandContext,
_pi: ExtensionAPI,
): Promise<void> {
const parts = args.trim().split(/\s+/).filter(Boolean);
const subcommand = parts[0] || "triage";
const clear = !parts.includes("--no-clear");
const backlog = parts.includes("--backlog");
if (subcommand !== "triage") {
ctx.ui.notify(
"Usage: /sf todo triage [--no-clear] [--backlog]\nReads root TODO.md, writes .sf/triage artifacts, and clears processed dump notes by default.",
"warning",
);
return;
}
const llmCall = buildTodoTriageLLMCall(ctx);
if (!llmCall) {
ctx.ui.notify("No model available for TODO triage.", "warning");
return;
}
try {
const output = await triageTodoDump(projectRoot(), llmCall, { clear, backlog });
ctx.ui.notify(
[
"TODO triage complete.",
`Report: ${output.markdownPath}`,
`Normalized inbox: ${output.normalizedJsonlPath}`,
`Eval candidates: ${output.evalJsonlPath}`,
`Eval candidate count: ${output.result.eval_candidates.length}`,
`Backlog items added: ${output.backlogItemsAdded}`,
clear ? "TODO.md was reset to the empty dump inbox." : "TODO.md was left unchanged.",
].join("\n"),
"info",
);
} catch (err) {
ctx.ui.notify(
`TODO triage failed: ${err instanceof Error ? err.message : String(err)}`,
"warning",
);
}
}

View file

@ -15,7 +15,7 @@ export interface GsdCommandDefinition {
type CompletionMap = Record<string, readonly GsdCommandDefinition[]>;
export const SF_COMMAND_DESCRIPTION =
"SF — Singularity Forge: /sf help|start|templates|next|auto|stop|pause|status|widget|visualize|queue|quick|discuss|capture|triage|dispatch|history|undo|undo-task|reset-slice|rate|skip|export|cleanup|model|mode|prefs|config|keys|hooks|run-hook|skill-health|doctor|logs|forensics|changelog|migrate|remote|steer|knowledge|harness|new-milestone|parallel|cmux|park|unpark|init|setup|inspect|extensions|update|fast|mcp|rethink|codebase|notifications|ship|do|session-report|backlog|pr-branch|add-tests|scan";
"SF — Singularity Forge: /sf help|start|templates|next|auto|stop|pause|status|widget|visualize|queue|quick|discuss|capture|triage|todo|dispatch|history|undo|undo-task|reset-slice|rate|skip|export|cleanup|model|mode|prefs|config|keys|hooks|run-hook|skill-health|doctor|logs|forensics|changelog|migrate|remote|steer|knowledge|harness|new-milestone|parallel|cmux|park|unpark|init|setup|inspect|extensions|update|fast|mcp|rethink|codebase|notifications|ship|do|session-report|backlog|pr-branch|add-tests|scan";
export const TOP_LEVEL_SUBCOMMANDS: readonly GsdCommandDefinition[] = [
{ cmd: "help", desc: "Categorized command reference with descriptions" },
@ -41,6 +41,7 @@ export const TOP_LEVEL_SUBCOMMANDS: readonly GsdCommandDefinition[] = [
{ cmd: "capture", desc: "Fire-and-forget thought capture" },
{ cmd: "changelog", desc: "Show categorized release notes" },
{ cmd: "triage", desc: "Manually trigger triage of pending captures" },
{ cmd: "todo", desc: "Triage root TODO.md dump into eval/backlog artifacts" },
{ cmd: "dispatch", desc: "Dispatch a specific phase directly" },
{ cmd: "history", desc: "View execution history" },
{ cmd: "undo", desc: "Revert last completed unit" },
@ -373,6 +374,11 @@ const NESTED_COMPLETIONS: CompletionMap = {
{ cmd: "promote", desc: "Promote backlog item to active slice" },
{ cmd: "remove", desc: "Remove backlog item" },
],
todo: [
{ cmd: "triage", desc: "Triage root TODO.md into .sf/triage artifacts" },
{ cmd: "triage --no-clear", desc: "Triage TODO.md without resetting it" },
{ cmd: "triage --backlog", desc: "Also add implementation tasks to .sf/BACKLOG.md" },
],
"pr-branch": [
{ cmd: "--dry-run", desc: "Preview what would be filtered" },
{ cmd: "--name", desc: "Custom branch name" },

View file

@ -179,6 +179,11 @@ export async function handleOpsCommand(
await handleTriage(ctx, pi, process.cwd());
return true;
}
if (trimmed === "todo" || trimmed.startsWith("todo ")) {
const { handleTodo } = await import("../../commands-todo.js");
await handleTodo(trimmed.replace(/^todo\s*/, "").trim(), ctx, pi);
return true;
}
if (trimmed === "config") {
await handleConfig(ctx);
return true;

View file

@ -12,6 +12,7 @@ import {
existsSync,
mkdirSync,
mkdtempSync,
readFileSync,
rmSync,
writeFileSync,
} from "node:fs";
@ -221,6 +222,90 @@ test("completing-milestone redispatches validation-attention when marker is stal
}
});
test("completing-milestone revalidates after validation-attention remediation advances round", async () => {
const base = mkdtempSync(join(tmpdir(), "sf-remediation-"));
mkdirSync(join(base, ".sf", "milestones", "M001"), { recursive: true });
mkdirSync(join(base, ".sf", "runtime", "validation-attention"), {
recursive: true,
});
mkdirSync(join(base, ".sf", "runtime", "units"), { recursive: true });
try {
writeFileSync(
join(base, ".sf", "milestones", "M001", "M001-VALIDATION.md"),
[
"---",
"verdict: needs-attention",
"remediation_round: 1",
"---",
"",
"# Validation Report",
"",
"Tracking cleanup was applied. Revalidate before completion.",
].join("\n"),
);
writeFileSync(
join(base, ".sf", "runtime", "validation-attention", "M001.json"),
JSON.stringify({
milestoneId: "M001",
createdAt: new Date().toISOString(),
remediationRound: 0,
}),
);
writeFileSync(
join(
base,
".sf",
"runtime",
"units",
"rewrite-docs-M001-validation-attention.json",
),
JSON.stringify({
version: 1,
unitType: "rewrite-docs",
unitId: "M001/validation-attention",
phase: "dispatched",
}),
);
const ctx = {
mid: "M001",
midTitle: "Test Milestone",
basePath: base,
state: { phase: "completing-milestone" } as any,
prefs: {} as any,
session: undefined,
};
const result = await completingRule!.match(ctx);
assert.ok(result !== null, "rule should match");
assert.equal(result!.action, "dispatch");
if (result!.action === "dispatch") {
assert.equal(result!.unitType, "validate-milestone");
assert.equal(result!.unitId, "M001");
}
const marker = JSON.parse(
readFileSync(
join(base, ".sf", "runtime", "validation-attention", "M001.json"),
"utf-8",
),
);
assert.equal(marker.revalidationRound, 1);
const second = await completingRule!.match(ctx);
assert.ok(second !== null, "second rule should match");
assert.equal(
second!.action,
"stop",
"second pass should not revalidate-loop",
);
} finally {
rmSync(base, { recursive: true, force: true });
}
});
test("completing-milestone proceeds normally when VALIDATION verdict is pass (#2675 guard)", async () => {
const base = mkdtempSync(join(tmpdir(), "sf-remediation-"));
mkdirSync(join(base, ".sf", "milestones", "M001"), { recursive: true });