singularity-forge/TODO.md
Mikael Hugo 617608347d fix(sf): align auto-mode prompts to canonical sf_task_complete / sf_slice_complete
Auto-mode prompts called legacy aliases (sf_complete_task, sf_complete_slice)
while guided used canonical (sf_task_complete, sf_slice_complete). The
divergence was locked in by the test 'auto execute-task requires legacy
completion alias until prompt contract is aligned' — explicit tech debt
marker.

Migrated:
- workflow-mcp.ts getRequiredWorkflowToolsForAutoUnit: returns canonical
- prompts/execute-task.md: 4 callsites
- prompts/complete-slice.md: 3 callsites
- prompts/reactive-execute.md: any (none on this file)
- workflow-mcp.test.ts: assertion + transport-error fixtures
- Test rename: 'requires legacy completion alias' → 'requires canonical'

The aliases stay registered (sf_complete_task → sf_task_complete) so
external callers and old session resumes don't break. Tool-naming.test.ts
still asserts both names route to the same handler.

Resolves: sf-moohqbza-yyq8sd.
Tests: workflow-mcp + tool-naming 29/29 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:25:53 +02:00

6 KiB

Raw Dump Inbox

Eval Candidates

  1. Test note for CI mode verification

SF Hardening Backlog From Claude Code Scan

Goal

Make SF auto-mode impossible to loop from broken docs, missing artifacts, stale runtime state, or ambiguous background-unit completion. ROADMAP.md must become a rendered human artifact, not executable dispatch state.

Track 1 - Canonical State And No-Doc-Loop Dispatch

  • Add getCanonicalMilestonePlan(basePath, milestoneId) as the only dispatch-facing milestone plan accessor.
  • Prefer DB slice rows when DB is available and populated.
  • Add .sf/milestones/Mxxx/Mxxx-ROADMAP.json as the structured fallback projection.
  • Treat ROADMAP.md as rendered display only.
  • Keep Markdown parsing only for import, migration, doctor repair, and parser tests.
  • Move parallel research, single-slice research, prior-slice guard, UAT/validation dispatch, and prompt slice enumeration to canonical plan state.
  • Add generated marker/hash metadata to ROADMAP.md.
  • Stop dispatch when DB/projection/Markdown disagree.
  • Add /sf doctor --fix support to re-render generated roadmap artifacts from canonical state.

Track 2 - Unit Runtime FSM

  • Introduce durable unit runtime state under .sf/runtime/units/*.json.
  • Model unit states: queued, claimed, running, progress, completed, failed, blocked, cancelled, stale, runaway-recovered, notified.
  • Persist retryCount, maxRetries, lastHeartbeatAt, lastProgressAt, lastOutputAt, outputPath, watchdogReason, and notifiedAt.
  • Prevent redispatch for terminal units with notifiedAt.
  • Allow retry only when status is retryable and retry budget remains.
  • Require explicit reset to rerun failed synthetic units like parallel-research.

Track 3 - Progress And Liveness

  • Separate heartbeat, progress, and output growth.
  • Treat silent-but-running as valid only when heartbeat is fresh.
  • Add watchdog classifiers: dead PID, expired lease, no heartbeat, no output growth, permission prompt, interactive prompt, runaway recovery.
  • Extend sf headless --output-format json query with active unit, status, elapsed time, retry count, watchdog reason, last progress time, and output path.
  • Later: render TUI/footer status rows from the same runtime model.

Track 4 - Event And Interrupt Policy

  • Add explicit event origins: user-message, system-steer, task-notification, memory-event, background-completion, permission-request.
  • Add interrupt behaviors: interrupt, queue, block, drop-if-stale.
  • Default user-origin messages to interrupt active work.
  • Default system/task/memory events to non-interrupting unless explicitly marked.
  • Scope queues so main loop does not consume subagent-directed events and subagents do not consume main user prompts.
  • Ensure background completions enqueue once and set notifiedAt.

Track 5 - Tool, Plugin, And Permission Boundaries

  • Add explicit tool contracts for read/write behavior, concurrency safety, permission requirements, and interrupt behavior.
  • Treat each background worker as a durable task with taskId, parentUnitId, status, output path, retry budget, and notification marker.
  • Add doctor checks for suspicious hook/tool/plugin config.
  • Ensure runtime permission errors name the denied tool/action and the relevant policy.

Track 6 - Release And Privacy Hygiene

  • Add packed-artifact scanner before release.
  • Fail release artifacts containing inline source maps, sourcesContent, .ts/.tsx source, local absolute paths, secrets, or debug-only strings.
  • Use npm pack --dry-run --json plus unpack inspection for npm artifacts.
  • Add telemetry wrappers that allow numeric/boolean metadata by default and require reviewed wrappers for string metadata.
  • Add tests for no-telemetry mode and no-nonessential-network mode.

Eval Candidates

  • Bad roadmap: DB has 2 slices, ROADMAP.md has 6 stale rows. Expected: dispatch uses DB/projection or stops; never dispatches stale rows.
  • Projection fallback: DB unavailable, ROADMAP.json exists. Expected: dispatch succeeds from projection.
  • Legacy unsafe fallback: DB unavailable, only ROADMAP.md exists. Expected: dispatch stops with doctor/migration instruction.
  • Drift detection: ROADMAP.md marker hash mismatches projection. Expected: /sf doctor reports drift.
  • Drift repair: /sf doctor --fix re-renders Markdown and clears drift.
  • Synthetic unit failure: parallel-research is runaway-recovered. Expected: cannot redispatch unless explicitly reset.
  • Notification idempotency: terminal unit with notifiedAt does not enqueue another completion notification.
  • Retry budget: retryable failure increments retryCount; exceeding maxRetries becomes terminal.
  • Stale heartbeat: missing heartbeat becomes stale, not infinite redispatch.
  • Interrupt policy: user steering interrupts active work; memory/system/task notifications do not interrupt by default.
  • Queue scoping: subagent-scoped notifications are not consumed by the main loop.
  • Release scanner: fixture artifact containing sourcesContent fails; clean packed artifact passes.

Verification Commands

  • npx vitest run src/resources/extensions/sf/tests/parallel-research-dispatch.test.ts --config vitest.config.ts --reporter=verbose
  • Focused tests for canonical plan, doctor drift, runtime FSM, interrupt policy, and release scanner.
  • npm run typecheck:extensions
  • sf headless --output-format json query against bad-roadmap and failed-runtime fixtures.

Implementation Order

  1. Canonical plan accessor and structured roadmap projection.
  2. Dispatch migration away from Markdown parsing.
  3. Doctor drift detection and repair.
  4. Unit runtime FSM and redispatch policy.
  5. Headless query liveness fields.
  6. Event interrupt policy.
  7. Tool/plugin permission contracts.
  8. Release artifact scanner and privacy wrappers.