Auto-mode prompts called legacy aliases (sf_complete_task, sf_complete_slice) while guided used canonical (sf_task_complete, sf_slice_complete). The divergence was locked in by the test 'auto execute-task requires legacy completion alias until prompt contract is aligned' — explicit tech debt marker. Migrated: - workflow-mcp.ts getRequiredWorkflowToolsForAutoUnit: returns canonical - prompts/execute-task.md: 4 callsites - prompts/complete-slice.md: 3 callsites - prompts/reactive-execute.md: any (none on this file) - workflow-mcp.test.ts: assertion + transport-error fixtures - Test rename: 'requires legacy completion alias' → 'requires canonical' The aliases stay registered (sf_complete_task → sf_task_complete) so external callers and old session resumes don't break. Tool-naming.test.ts still asserts both names route to the same handler. Resolves: sf-moohqbza-yyq8sd. Tests: workflow-mcp + tool-naming 29/29 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 KiB
6 KiB
Raw Dump Inbox
Eval Candidates
- Test note for CI mode verification
SF Hardening Backlog From Claude Code Scan
Goal
Make SF auto-mode impossible to loop from broken docs, missing artifacts, stale runtime state, or ambiguous background-unit completion. ROADMAP.md must become a rendered human artifact, not executable dispatch state.
Track 1 - Canonical State And No-Doc-Loop Dispatch
- Add
getCanonicalMilestonePlan(basePath, milestoneId)as the only dispatch-facing milestone plan accessor. - Prefer DB slice rows when DB is available and populated.
- Add
.sf/milestones/Mxxx/Mxxx-ROADMAP.jsonas the structured fallback projection. - Treat
ROADMAP.mdas rendered display only. - Keep Markdown parsing only for import, migration, doctor repair, and parser tests.
- Move parallel research, single-slice research, prior-slice guard, UAT/validation dispatch, and prompt slice enumeration to canonical plan state.
- Add generated marker/hash metadata to
ROADMAP.md. - Stop dispatch when DB/projection/Markdown disagree.
- Add
/sf doctor --fixsupport to re-render generated roadmap artifacts from canonical state.
Track 2 - Unit Runtime FSM
- Introduce durable unit runtime state under
.sf/runtime/units/*.json. - Model unit states:
queued,claimed,running,progress,completed,failed,blocked,cancelled,stale,runaway-recovered,notified. - Persist
retryCount,maxRetries,lastHeartbeatAt,lastProgressAt,lastOutputAt,outputPath,watchdogReason, andnotifiedAt. - Prevent redispatch for terminal units with
notifiedAt. - Allow retry only when status is retryable and retry budget remains.
- Require explicit reset to rerun failed synthetic units like
parallel-research.
Track 3 - Progress And Liveness
- Separate heartbeat, progress, and output growth.
- Treat silent-but-running as valid only when heartbeat is fresh.
- Add watchdog classifiers: dead PID, expired lease, no heartbeat, no output growth, permission prompt, interactive prompt, runaway recovery.
- Extend
sf headless --output-format json querywith active unit, status, elapsed time, retry count, watchdog reason, last progress time, and output path. - Later: render TUI/footer status rows from the same runtime model.
Track 4 - Event And Interrupt Policy
- Add explicit event origins:
user-message,system-steer,task-notification,memory-event,background-completion,permission-request. - Add interrupt behaviors:
interrupt,queue,block,drop-if-stale. - Default user-origin messages to interrupt active work.
- Default system/task/memory events to non-interrupting unless explicitly marked.
- Scope queues so main loop does not consume subagent-directed events and subagents do not consume main user prompts.
- Ensure background completions enqueue once and set
notifiedAt.
Track 5 - Tool, Plugin, And Permission Boundaries
- Add explicit tool contracts for read/write behavior, concurrency safety, permission requirements, and interrupt behavior.
- Treat each background worker as a durable task with
taskId,parentUnitId, status, output path, retry budget, and notification marker. - Add doctor checks for suspicious hook/tool/plugin config.
- Ensure runtime permission errors name the denied tool/action and the relevant policy.
Track 6 - Release And Privacy Hygiene
- Add packed-artifact scanner before release.
- Fail release artifacts containing inline source maps,
sourcesContent,.ts/.tsxsource, local absolute paths, secrets, or debug-only strings. - Use
npm pack --dry-run --jsonplus unpack inspection for npm artifacts. - Add telemetry wrappers that allow numeric/boolean metadata by default and require reviewed wrappers for string metadata.
- Add tests for no-telemetry mode and no-nonessential-network mode.
Eval Candidates
- Bad roadmap: DB has 2 slices,
ROADMAP.mdhas 6 stale rows. Expected: dispatch uses DB/projection or stops; never dispatches stale rows. - Projection fallback: DB unavailable,
ROADMAP.jsonexists. Expected: dispatch succeeds from projection. - Legacy unsafe fallback: DB unavailable, only
ROADMAP.mdexists. Expected: dispatch stops with doctor/migration instruction. - Drift detection:
ROADMAP.mdmarker hash mismatches projection. Expected:/sf doctorreports drift. - Drift repair:
/sf doctor --fixre-renders Markdown and clears drift. - Synthetic unit failure:
parallel-researchisrunaway-recovered. Expected: cannot redispatch unless explicitly reset. - Notification idempotency: terminal unit with
notifiedAtdoes not enqueue another completion notification. - Retry budget: retryable failure increments
retryCount; exceedingmaxRetriesbecomes terminal. - Stale heartbeat: missing heartbeat becomes
stale, not infinite redispatch. - Interrupt policy: user steering interrupts active work; memory/system/task notifications do not interrupt by default.
- Queue scoping: subagent-scoped notifications are not consumed by the main loop.
- Release scanner: fixture artifact containing
sourcesContentfails; clean packed artifact passes.
Verification Commands
npx vitest run src/resources/extensions/sf/tests/parallel-research-dispatch.test.ts --config vitest.config.ts --reporter=verbose- Focused tests for canonical plan, doctor drift, runtime FSM, interrupt policy, and release scanner.
npm run typecheck:extensionssf headless --output-format json queryagainst bad-roadmap and failed-runtime fixtures.
Implementation Order
- Canonical plan accessor and structured roadmap projection.
- Dispatch migration away from Markdown parsing.
- Doctor drift detection and repair.
- Unit runtime FSM and redispatch policy.
- Headless query liveness fields.
- Event interrupt policy.
- Tool/plugin permission contracts.
- Release artifact scanner and privacy wrappers.