Commit graph

3920 commits

Author SHA1 Message Date
Mikael Hugo
0f0aee5bf0 feat(sf): port 3 gsd-2 DB helpers + improve /sf escalate list
Three small DB helpers from gsd-2 that SF was missing, plus a UX
improvement to /sf escalate list that uses one of them.

PDD spec:

setSliceSketchFlag(milestoneId, sliceId, isSketch) — generalized
  sketch-flag setter. Replaces my narrower clearSliceSketch (which
  remains as a thin wrapper for callers that only zero). Use this
  when a re-plan flow wants to revert a slice back to sketch state.

autoHealSketchFlags(milestoneId, hasPlanFile) — safety net for
  progressive planning. Predicate-based: caller passes a function
  that resolves whether a PLAN file exists for a slice, function
  flips is_sketch=0 for any slice that has both is_sketch=1 AND a
  plan file. Catches DB-FS drift after crashes/manual edits.

listEscalationArtifacts(milestoneId, includeResolved=false) —
  cross-slice DB-side filter for /sf escalate list. Replaces my
  hand-rolled inner-loop over getMilestoneSlices() + getSliceTasks()
  + filter — single SQL query, sorted by sequence, faster.

UX improvement to commands-escalate.ts:
  - /sf escalate list: now uses listEscalationArtifacts; shows
    PENDING / awaiting-review / resolved status badges per entry.
  - /sf escalate list --all: includes resolved entries (audit trail).
  - Better hint message when none active: 'Use --all to include
    resolved'.

Verified:
  - typecheck clean (one parallel-session-introduced error in
    self-feedback-drain.ts is unrelated — they import a missing
    utils/error.ts; will land when their commit does).
  - escalation-feature.test.ts (21 tests) + sf-db.test.ts (16
    tests) still pass — no regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:22:02 +02:00
Mikael Hugo
82633b6f5e feat(sf): sf-audit-traces workflow for slow self-improvement loop
A standalone agent prompt that reads SF's observability sources
(self-feedback / journal / activity / judgments / forensics) and
files AT MOST 3 recurring-pattern findings via sf_self_report so
they enter the existing triage flow.

PDD spec:

Purpose: continuous self-improvement loop. SF already has the data
  sources (self-feedback.jsonl, journal/, activity/, judgments/) and
  the consumer pattern (triage-self-feedback → requirement-promoter).
  What was missing: a standalone prompt that pulls those sources
  together for a scheduled run.
Consumer: agents invoked via '/schedule every morning sf-audit-traces'
  (cloud) or '/sf workflow run sf-audit-traces' (manual).
Contract:
  1. Snapshot the trace volumes (file counts + line counts) into
     evidence so reports are concrete, not prose.
  2. Bar = 3+ occurrences. Single events go to operator eyeballs,
     not permanent self-feedback entries.
  3. Hard cap of 3 entries per run. The whole point is slow
     iteration — the triage queue is human-paced, not a firehose.
  4. NEVER auto-apply. Even if the fix looks one-line obvious, file
     and stop. The triage flow decides what becomes work.
  5. Zero findings is a successful run when the system is healthy.
Failure boundary: missing source files → skip silently. Read errors
  → handle gracefully. Never block on absence.
Evidence (verified during scan before writing):
  - 181 self-feedback entries (55 open, 126 resolved)
  - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure
    (2), context-injection-gap (2), orphan-prompt (2)
  - Journal: 6-233 events per active day
  - Activity logs: per-unit JSONL transcripts present
  - All sources accessible via plain file reads — no special tools.
Non-goals:
  - ML training on traces
  - Cross-project trace aggregation
  - Auto-applying fixes (triage flow already does that)
  - Fast iteration (deliberately slow — 3/run cap means at most 21
    new triage items per week even with daily runs)
Invariants:
  - Safety: agent never edits code/prompts/templates/docs.
  - Liveness: zero findings is a valid output. The agent doesn't
    fabricate patterns to justify a run.

Discovery verified: 28 total workflow templates after this commit
(was 27); plugins.get('sf-audit-traces') returns the plugin from
the bundled source.

Pairs with: triage-self-feedback (reads what this files),
requirement-promoter (auto-promotes recurring kinds to requirements),
self-feedback-drain (session-start drain into repair turns). The
audit is the IN end of that pipeline; the rest of SF was already
the OUT end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:15:13 +02:00
Mikael Hugo
e381e3c8ad fix(sf): bump SCHEMA_VERSION to 25 + update sf-db.test.ts assertion
The migrate gate `if (currentVersion >= SCHEMA_VERSION) return;` was
short-circuiting at 23, leaving the v24 (escalation_awaiting_review)
and v25 (escalation_override_applied) migrations unreached on fresh
databases. Test caught it: 'fresh DB schema init (memory)' expected
MAX(version)=23 then expected 25 after my test bump, both kept
returning 23 because the migrate function bailed before the new
ensureColumn calls.

Two-line fix:
- sf-db.ts:133  SCHEMA_VERSION 23 → 25
- sf-db.test.ts:88 + :222  expected version 23 → 25

Now fresh DBs run all migrations through v25 and end at the latest
version. Existing databases with version 24 still get v25 applied
because currentVersion < SCHEMA_VERSION (24 < 25).

37/37 tests pass (sf-db + escalation-feature suites). No regression
in the broader 127-test smoke suite that ran before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:05:06 +02:00
Mikael Hugo
aa67c1453c test(sf): full lifecycle coverage for ADR-011 P2 escalation feature
21 vitest tests covering the entire escalation chain shipped this
session. Each contract claim from prior PDD specs gets at least one
verifying test:

buildEscalationArtifact validation (4)
  - option count outside [2,4] → throws
  - duplicate option ids → throws
  - recommendation referencing unknown id → throws
  - happy path → version=1, taskId set, ISO createdAt

writeEscalationArtifact + DB flag flips (3)
  - continueWithDefault=false → escalation_pending=1
  - continueWithDefault=true → escalation_awaiting_review=1
  - two writes flip the pair atomically (mutually exclusive)

detectPendingEscalation (4)
  - empty slice → null
  - paused task → returns task id
  - awaiting_review tasks DO NOT pause
  - resolved (respondedAt set) tasks DO NOT pause

resolveEscalation (5)
  - 'accept' selects recommendation
  - explicit option id resolves with userRationale persisted
  - invalid choice → status=invalid-choice with valid list
  - re-resolve → already-resolved
  - unknown task → not-found

claimOverrideForInjection carry-forward (5)
  - no escalation → null
  - pending (unresolved) → null
  - resolved → returns block + sourceTaskId + sets DB flag=1
  - second claim → null (race-safe idempotent)
  - clearTaskEscalationFlags preserves artifact path (audit trail)

Provides regression protection for the full producer→consumer→
resolution→carry-forward path. All 21 pass against current head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:56:12 +02:00
Mikael Hugo
125496ce36 docs(sf): surface ADR-011 toggles in PREFERENCES.md template
Three new options got wired this session but the bundled template
didn't mention them, so users had no discoverable way to know they
existed. Adds them as commented hint fields:

- phases.progressive_planning — sketch→refine slice planning
- phases.mid_execution_escalation — task agents can pause for user
  decision via sf_task_complete escalation payload + /sf escalate
- planning_depth (top-level) — 'deep' enables project-level
  discussion gate before any milestone work

All three default off (commented out / unset) so existing users see
zero behavior change from this template update; enabling any of them
is a single uncomment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:53:40 +02:00
Mikael Hugo
4b6eb86b84 feat(sf): carry-forward injection — final piece of escalation feature (PDD)
Replaces the claimOverrideForInjection stub with a real race-safe
implementation. With this commit, the full escalation loop is wired:
agent escalates → user pauses → user resolves → next executor in the
slice sees the user's choice as a hard constraint in its prompt.

The buildExecuteTaskPrompt call site at auto-prompts.ts:2452-2469
already invoked claimOverrideForInjection (gated on
phases.mid_execution_escalation). Before this commit it was a no-op
because the function returned null unconditionally. Now it actually
delivers the override block.

PDD spec for this change:

Purpose: complete the loop. Without carry-forward, the loop 'continues'
  but the next executor re-encounters the same ambiguity that
  triggered the escalation.
Consumer: buildExecuteTaskPrompt in auto-prompts.ts (already wired).
Contract:
  1. No resolved-but-unapplied override in this slice → returns null.
     Existing behavior preserved when no escalation pending. Verified.
  2. Pending escalation (no respondedAt) → returns null. Caller's
     pause-detection layer handles those. Verified.
  3. Resolved escalation (respondedAt + userChoice set) →
     atomically marks escalation_override_applied=1 (race-safe via
     UPDATE … WHERE applied=0) and returns formatted markdown block
     with sourceTaskId. Verified.
  4. Second claim on the same override → null (race loser or
     already-applied). Verified.
  5. Missing/malformed artifact → logWarning + null without claiming
     (so the row isn't silently swallowed by an applied=1 flip).
Failure boundary:
  - claimEscalationOverride is the atomic boundary. Either you claim
    it and it's yours forever, or someone else did and you skip.
  - Validation BEFORE claim — bad artifact never marks the row applied.
  - DB unavailable in claimEscalationOverride → returns false → caller
    treats as race-loser → null. Safe.
Evidence:
  - Smoke test exercises 4 contract conditions:
    no-override → null
    pending-only → null
    resolved-then-claim → returns block + sets DB flag
    second-claim → null (idempotent)
  - Typecheck clean.
  - All 62 existing preferences tests still pass (no regression in
    the related plumbing).
Non-goals:
  - reject-blocker carry-forward (gsd-2 has it; needs blocker_source
    DB column SF doesn't have).
  - Cross-slice override carry-forward (current scope: per-slice).
  - Override-applied audit event (gsd-2 emits one; can add later).
Invariants:
  - Safety: applied flag is set BEFORE the prompt is built — so a
    crash mid-build never re-injects on retry.
  - Liveness: any task in the slice with a resolved override gets
    surfaced in sequence order (lowest sequence first via
    findUnappliedEscalationOverride's ORDER BY).
  - Race-safety: SQL UPDATE … WHERE applied=0 returns changes>0 only
    for the winner. Tested with sequential claims; both winners and
    losers behave correctly.
DB schema: tasks.escalation_override_applied (INTEGER NOT NULL
DEFAULT 0), migration v25.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:51:56 +02:00
Mikael Hugo
2c044f340f feat(sf): auto-fill empty model fallbacks from benchmark picker (PDD)
Closes the gap that left the user's session paused on a quota error
with no fallback to switch to. Before this commit:
  - User pins models.execution: { model: gemini-3-flash-preview }
  - No fallbacks array → resolveModelWithFallbacksForUnit returns
    { primary, fallbacks: [] }
  - agent-end-recovery.ts line 348 checks fallbacks.length > 0 → false
  - Loop pauses on the first rate-limit, even though the user has
    other API-keyed providers available.

After: an empty/missing fallbacks array auto-fills from
resolveAutoBenchmarkPickForUnit (which picks API-keyed candidates
ranked by benchmark scores), excluding the user's pinned primary so
we never get a no-op switch to the same model.

PDD spec:

Purpose: out-of-the-box auto-switch to fallback models when a user
  pins only a primary. Matches user expectation that 'the system
  selects models automatically' when keys are available.
Consumer: agent-end-recovery.ts model-fallback flow on rate-limit.
Contract:
  1. models.<unit>: '<id>' (string, no fallbacks) → primary plus
     auto-filled fallbacks. Unchanged primary, fallbacks excluding
     primary.
  2. models.<unit>: { model: '<id>', fallbacks: ['a', 'b'] } (explicit
     non-empty) → unchanged. User intent respected.
  3. models.<unit>: { model: '<id>' } (object, no fallbacks) → auto-
     fill from benchmark picker.
  4. models.<unit>: { model: '<id>', fallbacks: [] } (explicit empty)
     → auto-fill (treat empty same as missing).
  5. No models config at all → unchanged behavior — full auto-pick.
Failure boundary:
  - resolveAutoBenchmarkPickForUnit returns undefined when no
    API-keyed providers exist → fallbacks stays empty (no candidates
    to switch to anyway).
  - autoBenchmark option still honored — set to false to opt out.
Evidence:
  - Smoke test: pinned 'gemini-3-flash-preview' with empty fallbacks +
    OPENROUTER_API_KEY + GEMINI_API_KEY in env → returns 4 fallbacks
    starting with minimax/MiniMax-M2.7. Primary not in fallbacks.
  - Existing 62 preferences tests + 5 rate-limit-model-fallback tests
    still pass — no regression.
Non-goals:
  - Cross-phase inheritance (planning falls back to execution config).
  - Persisting auto-filled fallbacks to PREFERENCES.md.
  - Mid-tool-call rate-limit recovery (different code path through
    pi-coding-agent's RetryHandler).
Invariants:
  - Safety: explicit non-empty user fallbacks NEVER overwritten —
    line userFallbacks.length > 0 short-circuits before auto-fill.
  - Liveness: empty arrays trigger auto-fill, so callers get a chain
    if any keys are configured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:43:28 +02:00
Mikael Hugo
e4a86ddf6f fix(sf): classify 'exhausted your capacity / quota will reset after Ns' as rate-limit
Real failure caught from a user session: provider returned
'Error: You have exhausted your capacity on this model. Your quota
will reset after 51s.' SF's classifier didn't match it (no 'rate
limit', no '429', no 'limit resets'), so it fell through to unknown
→ no auto-resume → loop paused indefinitely until manual /sf
autonomous restart.

PDD spec:

Purpose: every legitimately transient quota error should auto-resume
  after the named cooldown, not pause indefinitely.
Consumer: classifyError() callers, ultimately the auto-loop.
Contract:
  - 'exhausted your|the (quota|capacity|usage)' → rate-limit
  - 'quota will reset' → rate-limit (paired with the above)
  - 'will reset after Ns' / 'will reset in Ns' → retryAfterMs = N*1000
Failure boundary: parse failure → 60s default (preserved).
Evidence: smoke test with 6 inputs:
   'exhausted your capacity ... will reset after 51s' → rate-limit/51000
   'rate limit exceeded' → rate-limit/60000 (unchanged)
   'Internal server error' → server/30000 (unchanged)
   '429 too many requests' → rate-limit/60000 (unchanged)
   'Invalid API key' → permanent (unchanged — still manual)
   'exhausted the usage. Will reset in 30s.' → rate-limit/30000
Non-goals: model-fallback-on-rate-limit (separate change — the
  provider-error-pause module currently waits and retries the same
  model; switching to the configured fallback model after the first
  rate-limit hit is a richer policy change).
Invariants:
  - Permanent classification still wins when no rate-limit pattern is
    present (auth/billing/invalid-key untouched).
  - Default 60s delay preserved when reset-time can't be parsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:35:55 +02:00
Mikael Hugo
f757a18417 feat(sf): /sf escalate user command + resolveEscalation (PDD)
Closes the user-facing loop for ADR-011 P2. The full escalation
end-to-end now works: agent files → loop pauses → user resolves
via /sf escalate → loop continues.

PDD spec for this change:

Purpose: let the user resolve a paused task escalation. Without this,
  escalation_pending=1 has no exit ramp other than manual SQL.
Consumer: users at the prompt — '/sf escalate list', '/sf escalate
  show <slice>/<task>', '/sf escalate resolve <slice>/<task> <choice>
  [-- <rationale>]'.
Contract:
  1. /sf escalate list → enumerate pending escalations in the active
     milestone, showing slice/task, question, options, recommendation.
  2. /sf escalate show <slice>/<task> → print the artifact's question
     + options with tradeoffs + recommendation + resolution status
     (resolved or unresolved).
  3. /sf escalate resolve <slice>/<task> <option-id> [-- <rationale>]
     → resolveEscalation in escalation.ts:
       - 'accept' selects the recommended option
       - any option id from the artifact is also valid
       - invalid choice → returns 'invalid-choice' with valid list
       - already resolved → 'already-resolved' with prior timestamp
       - not found → 'not-found' with the task path
     On success: artifact gains respondedAt/userChoice/userRationale,
     DB flags cleared, UOK audit event 'escalation-user-responded'
     emitted.
Failure boundary:
  - DB unavailable → 'SF database is not available. Run /sf doctor.'
  - Active milestone missing → 'No active milestone — nothing to list.'
  - Malformed artifact path → readEscalationArtifact returns null →
    handler returns 'not-found'.
  - clearTaskEscalationFlags called inside the resolver — never
    leaves the row in a half-resolved state.
Evidence: smoke test exercises 4 contract conditions end-to-end:
  invalid-choice, accept→resolved (chosen option = recommendation),
  already-resolved on re-run, not-found for unknown task. Typecheck
  clean.
Non-goals:
  - reject-blocker choice (gsd-2 has it; needs a blocker_source DB
    column SF doesn't have)
  - Carry-forward injection (claimEscalationOverride —
    findUnappliedEscalationOverride flow). The override is logged in
    the artifact for the user; agent context injection lands when
    the executor's prompt builder is wired to read it.
  - Cross-milestone listing (current implementation: active milestone
    only — matches /sf escalate list's most useful default behavior).
Invariants:
  - Safety: invalid-choice and not-found return without writing —
    no half-state.
  - Safety: clearTaskEscalationFlags zeros pending+awaiting in one
    UPDATE — reader can never see half-cleared state.
  - Liveness: after resolve, next state derivation cycle sees
    escalation_pending=0 → phase != 'escalating-task' → dispatch
    routes normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:31:45 +02:00
Mikael Hugo
2bf6c51fde feat(sf): expose escalation via sf_task_complete (PDD)
Closes the agent surface for ADR-011 P2. Task agents can now include
an optional 'escalation' payload on sf_task_complete, gated by
phases.mid_execution_escalation. When the preference is on and the
field is present, the executor builds and writes the artifact, which
flips tasks.escalation_pending or escalation_awaiting_review based
on continueWithDefault. The producer chain from 14efcd773 is now
agent-callable.

PDD spec for this change:

Purpose: give task agents a way to file a mid-execution escalation
  through the same tool they already call to record completion. No
  new tool surface — escalation rides as an optional field on
  sf_task_complete (matches gsd-2's design intent).
Consumer: task agents (execute-task) when they hit ambiguity that
  requires user judgment.
Contract:
  1. phases.mid_execution_escalation !== true → escalation field
     silently ignored, current behavior preserved. Verified.
  2. preference on + escalation field → buildEscalationArtifact
     validates, writeEscalationArtifact persists, DB flag set,
     result text + details report path + status. Verified.
  3. continueWithDefault=false → status='pending' (loop pauses).
     continueWithDefault=true → status='awaiting-review' (no pause).
  4. Escalation write failures are caught — task completion never
     blocks on an escalation error (logged via logError).
Failure boundary:
  - Validation errors from buildEscalationArtifact propagate as
    caught try/catch in the executor → logged → task still completes.
  - Preference loader fails → behaves as if preference is off.
  - DB write failures fall through; the task is already recorded.
Evidence: smoke test exercises both preference states (on writes
  artifact + sets flag; off silently ignores). Typecheck clean.
  Existing sf_task_complete callers without an escalation field
  see zero change in result shape or behavior.
Non-goals:
  - resolveEscalation (apply user's choice → carry forward as
    override) — bigger flow, later fire.
  - listActionableEscalations / listAllEscalations — for /sf
    escalate list, later fire.
  - /sf escalate user command (later fire).
Invariants:
  - Safety: escalation field is Optional in the schema; no caller
    is forced to migrate.
  - Liveness: build+write happen synchronously after handleCompleteTask
    returns; on success, the next state-derivation cycle picks up
    pending=1 and pauses.
Schema additions to preferences-validation.ts:
  - mid_execution_escalation, progressive_planning recognized as
    valid phases keys (previously typed in PhaseSkipPreferences but
    silently stripped by the validator).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:24:04 +02:00
Mikael Hugo
e82e878eaa fix(uok): write parity exit heartbeat on SIGTERM/SIGINT before process.exit
The signal handler in auto-supervisor.ts called process.exit(0) directly,
bypassing the finally block in runAutoLoopWithUok() that writes the UOK
parity exit heartbeat. This caused 55+ missing exit events in the parity
log (78 enters vs 22 exits), making the enter/exit mismatch report
meaningless.

Changes:
- auto-supervisor.ts: add optional onSignal callback to registerSigtermHandler,
  invoked before process.exit(0) with best-effort error swallowing
- auto.ts: wrapper now passes a callback that writes the UOK parity exit
  heartbeat + refreshes the parity report before the hard exit
- auto-start.ts: update BootstrapDeps interface to accept optional onSignal
- tests: add 2 tests verifying callback invocation and error swallowing

Fixes the UOK parity critical mismatch reported in uok-parity-report.json.
2026-05-02 20:20:40 +02:00
Mikael Hugo
14efcd7734 feat(sf): producer side of mid-execution escalation (PDD)
Closes the producer half of ADR-011 P2. With this commit a task agent
can call buildEscalationArtifact + writeEscalationArtifact and the
escalation goes end-to-end: artifact persisted to disk, DB flag set,
state derivation picks it up, dispatch returns 'stop'.

PDD spec for this change:

Purpose: let a task agent file an escalation when it hits a decision
  the user must make (overwrite vs fail, model A vs model B, etc.)
  rather than continue past undocumented ambiguity.
Consumer: future sf_task_escalate tool, and direct callers of
  escalation.ts (e.g., resolve-time DB tools).
Contract:
  1. buildEscalationArtifact validates options (2-4 entries, unique
     ids, recommendation must reference a real option id) and throws
     a descriptive Error before any IO. Verified via smoke test:
     unknown recommendation id → "is not one of the option ids: …"
  2. writeEscalationArtifact atomically writes the JSON to
     .sf/milestones/{M}/slices/{S}/tasks/{T}-ESCALATION.json,
     auto-creating the tasks/ subdirectory.
  3. continueWithDefault=false → setTaskEscalationPending → loop
     pauses on next dispatch (verified end-to-end).
  4. continueWithDefault=true → setTaskEscalationAwaitingReview →
     loop continues; artifact recorded for human review later
     (verified — detectPendingEscalation returns null for awaiting).
  5. clearTaskEscalationFlags zeros both pending+awaiting but
     preserves escalation_artifact_path so the audit trail survives.
  6. Emits a UOK audit event 'escalation-manual-attention-created'
     with traceId 'escalation:{M}:{S}:{T}' for cross-system trace.
Failure boundary:
  - Validation throws BEFORE any DB or FS write — partial state
    impossible.
  - resolveSlicePath returns null when the slice doesn't exist;
    writeEscalationArtifact throws with a clear /sf doctor hint.
  - atomicWriteSync is the same temp+rename pattern used by every
    other SF artifact write.
Evidence:
  - typecheck clean
  - smoke test exercises all 7 contract conditions end-to-end
    (build, write, pending detection, awaiting-review skip,
    clear, validation rejection, audit trail traceId)
Non-goals:
  - sf_task_escalate MCP tool registration (separate fire — small,
    just exposing buildEscalationArtifact+writeEscalationArtifact
    via the tool surface).
  - resolveEscalation (apply user's choice → clear flags → carry
    forward as override) — bigger; later fire.
  - listActionableEscalations / listAllEscalations helpers — for
    /sf escalate list, later fire.
  - /sf escalate user command itself.
Invariants:
  - Safety: builder validates BEFORE writer commits anything. The
    two phases never partially succeed.
  - Liveness: the two flags are mutually exclusive (set helpers
    flip both atomically in one UPDATE) — no state where both 1.
DB schema gains escalation_awaiting_review column (v24 migration).
The two helpers setTaskEscalationPending and
setTaskEscalationAwaitingReview write the mutually-exclusive flag
pair in one UPDATE so a reader can never observe both = 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:16:15 +02:00
Mikael Hugo
a558ff6c64 feat(sf): dispatch pause-for-escalation rule (PDD)
Closes the basic escalation loop. With this commit, end-to-end:
- Task agent writes escalation_pending=1 + escalation_artifact_path
  to the tasks DB row (DB schema from 62dacb627).
- State derivation detects the pause and emits phase='escalating-task'
  with /sf escalate hint in nextAction (ea8819906).
- Auto-dispatch sees phase='escalating-task' FIRST in the rule order
  and returns 'stop' with the nextAction message — no other rule runs.

PDD spec:

Purpose: never let the loop continue past a pending escalation.
Consumer: auto-mode dispatcher (DISPATCH_RULES first entry).
Contract:
  1. state.phase !== 'escalating-task' → return null (fall through).
  2. state.phase === 'escalating-task' → return action='stop' with
     the state's nextAction (the /sf escalate hint state.ts produced).
  3. Rule sits at index 0 of DISPATCH_RULES so phase-agnostic rules
     below (rewrite-docs, UAT, reassess) cannot bypass it.
Failure boundary: pure phase check, no fs/db access — nothing to fail.
Evidence: typecheck clean. State derivation already smoke-tested in
  ea8819906 — once that returns phase='escalating-task', this rule
  emits the stop. End-to-end happy path is just two function calls.
Non-goals:
  - Tools to write escalation_pending (the producer side — task
    agents need a tool for this; later fire)
  - /sf escalate user command (later fire)
  - Resolution flow (escalation.ts has the schema; resolveEscalation
    helper from gsd-2 is not yet ported — later fire)
Invariants:
  - Safety: phase !== 'escalating-task' → 1 condition check, return
    null. Zero overhead in the common case.
  - Liveness: when paused, dispatch returns immediately — never
    runs another rule that could mutate slice state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:07:56 +02:00
Mikael Hugo
ea8819906d feat(sf): wire escalation detection into state derivation (PDD)
State derivation now emits phase='escalating-task' when a task in the
active slice is paused waiting for a user decision. Builds on the
type+DDL foundation in 62dacb627. Together they get the loop to STOP
when there's a pending escalation rather than carrying past an
undocumented decision.

PDD spec for this change:

Purpose: pause auto-mode at the state-derivation layer when any task
  in the active slice has escalation_pending=1 with an unresolved
  escalation artifact. The dispatcher (next fire) sees phase=
  'escalating-task' and returns 'stop' rather than dispatching new
  work over a pending decision.
Consumer: state.ts deriveStateFromDb() callers — the auto-loop, the
  /sf status dashboard, the future /sf escalate command.
Contract:
  1. Empty tasks list → null (no pause). Verified.
  2. Task without escalation_pending → null. Verified.
  3. escalation_pending=1 but no artifact path → null (treats as
     not actionable). Verified.
  4. escalation_pending=1 + valid artifact + no respondedAt → returns
     task id; state.phase = 'escalating-task' with task id in
     blockers and a /sf escalate hint in nextAction. Verified.
  5. respondedAt set → null (already resolved, fall through).
     Verified.
Failure boundary: any read/parse failure on the artifact returns null
  from detectPendingEscalation — state derivation falls through to
  existing behavior. Strict schema validation in readEscalationArtifact
  treats malformed artifacts as 'no actionable escalation here.'
Evidence: smoke test exercises all 5 contract conditions end-to-end
  with real filesystem artifacts. Typecheck clean. Existing state
  derivation paths unchanged when no task is paused (early continue
  on escalation_pending !== 1 in detectPendingEscalation's loop).
Non-goals:
  - Dispatch rule that returns 'stop' on phase='escalating-task'
    (next fire — needs no DB changes, just an auto-dispatch.ts edit)
  - Escalation artifact creation tools (gsd-2 has writeEscalation-
    Artifact + buildEscalationArtifact + setTaskEscalationPending —
    those land when a task agent needs to file an escalation)
  - /sf escalate user command (later fire)
Invariants:
  - Safety: no escalation pending → 0 file system reads (loop early-
    continues), zero behavior change vs current.
  - Liveness: if a task IS paused, state.phase becomes 'escalating-
    task' immediately — no race with dispatch ordering.
Assumptions verified:
  - SF's EscalationArtifact + EscalationOption types match gsd-2's
    schema (verified earlier this session).
  - TaskRow has escalation_pending and escalation_artifact_path
    fields (added in 62dacb627).
  - getSliceTasks() returns DB rows that include those fields after
    the v23 migration ran.
  - state.ts has the slice-level scope I need (activeMilestone +
    activeSlice + registry + requirements + progress all visible at
    the insertion point).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:06:29 +02:00
Mikael Hugo
d3574f3c4d fix(sf): guard escalation index migration 2026-05-02 20:05:12 +02:00
Mikael Hugo
62dacb6270 feat(sf): foundation for mid-execution escalation (ADR-011 P2)
Type-level + DB scaffolding for the escalation feature gsd-2 has but
SF lacks. Pure additive — no behavior change yet. Mirrors the same
incremental pattern that worked for progressive planning (types +
DDL first, state derivation + dispatch + module port in subsequent
fires).

PDD spec:

Purpose: lay the foundation so a task agent can write
  tasks.escalation_pending=1 + escalation_artifact_path=<file> when
  it hits a decision the user must make. Future fires will: (1) add
  detectPendingEscalation() to state.ts, (2) add a dispatch rule that
  returns 'stop' on phase='escalating-task', (3) port the escalation
  helper module from gsd-2.
Consumer: task agents (execute-task) when they hit ambiguity that
  shouldn't be silently resolved. Operators running future
  /sf escalate list/resolve commands.
Contract:
  - types.ts:23 Phase union now includes 'escalating-task'.
  - sf-db.ts:370-371 fresh CREATE TABLE for tasks gains
    escalation_pending + escalation_artifact_path.
  - sf-db.ts:1430+ schema_version 23 migration adds the columns +
    an opportunistic index for fast pending-escalation lookups.
  - TaskRow type gains escalation_pending?: number and
    escalation_artifact_path?: string | null. rowToTask returns
    them with safe defaults (0 and null).
Failure boundary: index creation is wrapped in try/catch — backends
  without index support fall through silently. Pre-migration installs
  treat the column as 0 default (no escalation pending) on first
  read, matching post-migration default.
Evidence: typecheck passes; smoke test deferred to next fire when the
  state derivation rule lands and we have something observable to
  test.
Non-goals:
  - state.ts emission of phase='escalating-task' (next fire)
  - auto-dispatch.ts pause rule (next fire)
  - escalation.ts helper module port (next fire — 367 LOC in gsd-2)
  - /sf escalate user command (later fire)
  - Escalation artifact format/validation (later fire)
Invariants:
  - Safety: ALTER TABLE adds nullable/defaulted columns; existing
    rows behave identically (escalation_pending defaults to 0).
  - Liveness: migration runs in same atomic transaction block as
    other version 23 work — never half-applied.
Assumptions verified:
  - SF already has EscalationOption + EscalationArtifact types
    (types.ts:692-704) — they were stubs with no producers; this
    commit is the producer-side scaffolding.
  - schema_version 22 already exists and is the current latest;
    23 is the next available.

ADR-011 reference: gsd-2's docs/dev/ADR-011-progressive-planning-
escalation.md covers both progressive planning (already ported in
this session) and mid-execution escalation (in progress). SF's own
ADR-011 file (docs/dev/ADR-011-swarm-chat-and-debate-mode.md) is
unrelated to gsd-2's ADR-011 — same number, different topic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:00:16 +02:00
Mikael Hugo
99965091d4 fix: inline-fix for high/critical self-feedback entries
- sf-mooe4m5k-6fm7z9: Add orphan next-server process reaper to web-mode.ts
  - reapOrphanedNextServerProcesses() detects and kills orphaned next-server
    processes with cwd under dist/web/standalone and parent PID 1
  - Wired into launchWebMode (before port reservation) and stopWebMode --all
  - Tests verify export and safe execution on non-Linux platforms

- sf-moocr4rv-au7r3l: Add harness promotion path from .sf to tracked docs
  - handleHarnessPromote() writes reviewable artifacts to docs/exec-plans/active/
  - handleHarness now accepts 'promote <finding-id>' subcommand
  - Promoted artifacts include observed state, review checklist, and notes

- sf-moocz9so-4ffov2: Add basic flow auditor via /sf doctor flow
  - runFlowAudit() inspects auto.lock, runtime units, notifications, child processes
  - Reports active unit age, warnings, recommendations, child process classification
  - Wired into handleDoctor as 'flow' subcommand
2026-05-02 19:57:41 +02:00
Mikael Hugo
fead8c1eca feat(sf): restore /sf debug session feature from gsd-2 (PDD)
Reverses commit 1891ccbdc which deleted commands-debug.ts and
debug-session-store.ts as orphan code. They were not orphan — gsd-2
has the full feature wired (commands/handlers/ops.ts:46-49). The 2
prompts that the dispatch references existed in gsd-2 but had never
been ported to SF, which is why my deletion looked correct in
isolation.

PDD spec for this restoration:

Purpose: bring back /sf debug — a structured debug-session workflow
  where the user runs '/sf debug <issue>' to start a session, and
  SF's auto-mode dispatches debug-session-manager (find_and_fix) or
  debug-diagnose (find_root_cause_only) prompts to the LLM.
Consumer: users at the prompt typing /sf debug.
Contract:
  - /sf debug              → usage text
  - /sf debug <issue>      → create session, dispatch find_and_fix
  - /sf debug list         → enumerate sessions
  - /sf debug status <slug>→ show session details
  - /sf debug continue <slug> → resume
  - /sf debug --diagnose <issue|slug> → diagnose-only path
Failure boundary: dispatch failures are caught — the session record
  is still persisted to .sf/debug/sessions/, the user can retry
  with /sf debug continue <slug>.
Evidence:
  - typecheck: clean
  - prompt-load: both debug-diagnose and debug-session-manager render
    against the var sets the dispatch passes
  - tests: 37/37 pass under vitest harness (file uses node:test
    runner, vitest counts 'tests 37 pass 37 fail 0' even though it
    tags the file 'failed' on reporter mismatch)
Non-goals:
  - Not redesigning the feature, just restoring it
  - Not adding new dispatch paths, just the user-facing /sf debug
Invariants:
  - Safety: when not invoked, debug-session-store.ts has zero
    side-effects (lazy file system access only on session create)
  - Liveness: session creation writes to .sf/debug/sessions/
    immediately so a crash mid-flow leaves a recoverable record
Assumptions verified:
  - All 7 files (2 ts + 2 prompts + ops.ts edit + catalog edit + 1
    test) port cleanly with gsd→sf identifier rewrites
  - The customType strings in commands-debug.ts and the test match
    ('sf-debug-start', 'sf-debug-continue', 'sf-debug-diagnose')

What we kept better than gsd-2: still SF (all SF improvements over
gsd-2 untouched — gap-audit, judgment-log, plan-quality, etc. all
preserved; the deletion this commit reverses was the only regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:49:34 +02:00
Mikael Hugo
0c7c4eca5b fix(sf): harden auto loops and skill sandbox 2026-05-02 19:46:36 +02:00
Mikael Hugo
d742602454 feat(sf): wire deep planning mode dispatch (PDD)
Closes the deep-mode rollout. With this commit, planning_depth: 'deep'
in PREFERENCES.md produces a 4-stage project-level discussion BEFORE
any milestone work — workflow-preferences → discuss-project →
discuss-requirements → research-project (research-decision is auto-
resolved to skip-default by SF's resolver, simpler than gsd-2's
explicit user-decision gate).

PDD spec for this change:

Purpose: route auto-mode through project-level setup before milestones
  when planning_depth='deep'. When absent or 'light', existing dispatch
  is preserved 1:1.
Consumer: auto-mode dispatcher (DISPATCH_RULES). One new rule sits at
  the top of the pre-planning ladder; existing rules unchanged.
Contract:
  1. planning_depth absent or 'light' → rule returns null → existing
     dispatch unchanged. Verified: returns 'not-applicable'.
  2. planning_depth='deep' + empty project → dispatches workflow-
     preferences then progresses through stages as artifacts land.
     Verified: returns 'pending'/'workflow-preferences'.
  3. status='blocked' → returns dispatch action 'stop' with the gate's
     reason — never silently bypasses a blocker.
  4. status='complete' → returns null → milestone-level rules below
     take over.
Failure boundary: if resolveDeepProjectSetupState() throws, return
  null and fall through to legacy rules. Never blocks the user on a
  helper crash.
Evidence: typecheck passes; gate-resolver smoke test verifies all
  three contract conditions; existing dispatch tests unchanged
  (light-mode regression-protected).
Non-goals:
  - In-flight idempotency markers for research-project (gsd-2 has
    these; SF's resolver auto-completes the stage when files land
    so the simple guard is sufficient — can add markers later if
    parallel orchestrator races emerge).
  - Plumbing structuredQuestionsAvailable through DispatchContext
    (defaulted to 'false' in builders for now; UI capability
    detection can be threaded later).
Invariants:
  - Safety: light-mode + absent-prefs paths return null at the FIRST
    check, before any DB or filesystem access. No regression possible.
  - Liveness: the resolver enforces forward progress — once a stage's
    artifact lands, the next gate fires next dispatch cycle.
Assumptions verified:
  - resolveDeepProjectSetupState exists in SF (deep-project-setup-policy.ts).
  - planning_depth: 'light' | 'deep' typed in preferences-types.ts:425.
  - All 4 dispatched unit types have builders in auto-prompts.ts (added
    in 5e8bdefbe).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:42:41 +02:00
Mikael Hugo
5e8bdefbea feat(sf): add 5 deep-planning-mode prompt builders (PDD)
Companion to b771dd0b3 (deep-mode prompt templates). Adds the five
auto-prompts.ts builders that load those templates with the
correct vars.

PDD spec for this change:

Purpose: complete the load path for deep-mode planning so dispatch
  rules can call buildDiscussProjectPrompt(), etc., without crashing.
Consumer: auto-dispatch.ts deep-mode rules (next commit).
Contract: each builder returns a populated prompt string for its
  unit type given (basePath, structuredQuestionsAvailable). All 5
  load successfully against their respective .md templates with no
  missing-var errors.
Failure boundary: loadPrompt throws SF_PARSE_ERROR if a template
  variable is missing — surfaces a clear error rather than silently
  rendering a half-substituted prompt.
Evidence: typecheck passes; loadPrompt verification in last fire's
  log shows all 5 prompts render to non-empty strings (2.6k–7.7k
  chars each).
Non-goals: dispatch wiring (separate commit, requires the
  deep-project-setup-policy resolver SF already has).
Invariants:
  - Safety: existing builders unchanged — no regression.
  - Liveness: each builder returns within one prompt-load round-trip.
Assumptions verified:
  - inlineTemplate('project'/'requirements') already exists in
    prompt-loader.ts.
  - sf_requirement_save and sf_summary_save tools exist in
    db-tools.ts (referenced by the prompts they load).
  - phases.planning_depth: 'light' | 'deep' already typed in
    preferences-types.ts (line 425).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:36:50 +02:00
Mikael Hugo
b771dd0b31 feat(sf): port 5 deep-planning-mode prompts from gsd-2
Adds the prompt templates that gsd-2 uses for its 'deep' planning_depth
mode — a multi-stage discussion flow (project → requirements → research
decision → parallel research) that runs BEFORE any milestone-level
discussion. SF only had milestone-level discuss flow; this fills the
project-level and requirements-level gaps.

Ported files:
- guided-discuss-project.md     — project-wide vision/users/anti-goals
- guided-discuss-requirements.md — structured R### requirements interview
- guided-research-decision.md    — yes/no gate for parallel research
- guided-research-project.md     — 4-way parallel research orchestrator
- guided-workflow-preferences.md — workflow + planning prefs collection

gsd→sf adaptations: GSD/gsd → SF/sf, .gsd/ → .sf/, gsd_*_save tool
names → sf_*_save, GSD Skill Preferences → SF Skill Preferences.

All 5 verified to load via loadPrompt with their required template
variables. The two sf_* tools they reference (sf_requirement_save and
sf_summary_save) already exist in db-tools.ts.

This is the first half of the deep-mode port. Remaining work for full
end-to-end:
- Port 5 builders to auto-prompts.ts (buildDiscussProjectPrompt, etc.)
- Port dispatch rules to auto-dispatch.ts (each gates on
  prefs.planning_depth === 'deep')
- Port resolveDeepProjectSetupState helper for the research-decision
  marker file
- Add planning_depth: 'deep' | 'light' to PhaseSkipPreferences

Default behavior preserved: without planning_depth set, current SF
'light' behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:33:19 +02:00
Mikael Hugo
a5c3d75344 feat(sf): sf_plan_slice auto-clears is_sketch when refining a sketch slice
Closes the last gap in the ADR-011 progressive planning chain. When
refine-slice runs and persists its full plan via sf_plan_slice, the
tool now zeros is_sketch atomically with the plan upsert (only when
the slice was actually a sketch — idempotent no-op otherwise).

This means the dispatch rule from 0c78b0038 will route to refine-slice
on the FIRST visit to a sketch slice, then route to plan-slice on any
subsequent visit because the flag is gone. No infinite refine loops.

sketch_scope is preserved on clear (clearSliceSketch only touches the
is_sketch column) so the original scope hint stays as an audit trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:28:50 +02:00
Mikael Hugo
d4be9afe15 feat(sf): producer side of progressive planning — plan-milestone emits sketches, insertSlice persists is_sketch+sketch_scope
Closes the producer half of the ADR-011 rollout. With this commit, the
end-to-end progressive planning path is complete and runnable:
plan-milestone → insertSlice writes is_sketch=1 → dispatch reads it →
refine-slice expands → clearSliceSketch zeros the flag.

Changes:

sf-db.ts insertSlice: extends the typed payload with isSketch and
sketchScope (3-valued: true/false/undefined). The INSERT INTO and ON
CONFLICT clauses gain is_sketch + sketch_scope columns with the same
NULL-sentinel pattern (raw_is_sketch / raw_sketch_scope) used by every
other field — so a re-plan that omits these flags preserves any
existing sketch state rather than blanking it.

sf-db.ts clearSliceSketch: new exported helper for refine-slice to
call after persisting the full plan. Idempotent.

tools/plan-milestone.ts validateSlices: handles 3-valued isSketch
semantics. When isSketch=true, sketchScope is required (non-empty)
and the heavyweight planning fields (successCriteria, proofLevel,
integrationClosure, observabilityImpact) are optional. Non-sketches
keep current strict validation (no regression for existing callers).

tools/plan-milestone.ts persist loop: passes isSketch/sketchScope
through to insertSlice; skips upsertSlicePlanning entirely when
isSketch=true (the planning fields belong to refine-slice's output).

End-to-end DB test verified all four behaviors:
 isSketch=true + sketchScope writes is_sketch=1 + scope text
 Explicit isSketch=false writes is_sketch=0
 Omitted isSketch defaults to 0 on insert
 clearSliceSketch zeros the flag while preserving sketch_scope
 ON CONFLICT with omitted isSketch preserves existing row state

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:26:08 +02:00
Mikael Hugo
c11595cf22 feat(sf): DB migration v22 adds is_sketch + sketch_scope columns (ADR-011)
Mirrors gsd-2's slices schema for progressive planning. Three changes
to sf-db.ts:

1. Fresh-install CREATE TABLE for slices (line 312) gains:
   - is_sketch INTEGER NOT NULL DEFAULT 0  -- 1 = awaiting refine
   - sketch_scope TEXT NOT NULL DEFAULT '' -- 2-3 sentence scope hint

2. Schema version 22 migration: ensureColumn for both fields so
   existing installs upgrade without data loss. Wrapped in the same
   currentVersion < N guard pattern as v6, v7, v8 ... v21.

3. rowToSlice() returns sketch_scope and is_sketch on the SliceRow
   so the dispatch rule from 0c78b0038 can read them via getSlice().

End-to-end verified: fresh DB has both columns at defaults; getSlice()
returns is_sketch=0, sketch_scope='' on a freshly-inserted slice.

Closes the DDL-migration gap from the progressive-planning rollout
plan in fef2e4b6f. Remaining: plan-milestone tool needs to write
is_sketch=1 + sketch_scope when emitting sketches; refine-slice tool
needs to clear is_sketch=0 when persisting the full plan. Until those
land, the dispatch rule still falls through (sketches never created).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:18:50 +02:00
Mikael Hugo
0c78b00381 feat(sf): wire ADR-011 progressive planning dispatch rule
Adds 'planning (sketch + progressive_planning) → refine-slice' rule
in auto-dispatch.ts, fired BEFORE the existing 'planning → plan-slice'
rule. Activates when:
- state.phase === 'planning'
- prefs?.phases?.progressive_planning === true
- slice has is_sketch=1 in the DB

When all three conditions hold, dispatches the refine-slice unit using
the existing buildRefineSlicePrompt + prompts/refine-slice.md (both
ported in earlier commits). Otherwise falls through to plan-slice
(graceful downgrade — current behavior is preserved when the flag is
off, which is the default).

Why this matters: without progressive planning, the milestone planner
has to either fully-plan every slice upfront (rots quickly) or hand-
wave each slice (executors overscope). Sketch+refine lets the planner
write 2-3 sentences of scope per slice and have refine-slice expand it
just-in-time using prior slice summaries as context — keeping each
plan sized for the actual current reality.

Defensive read of slice.is_sketch with try/catch: pre-migration installs
without the column simply fall through to plan-slice, no error. The DB
DDL migration will land separately as part of the full progressive-
planning rollout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:14:21 +02:00
Mikael Hugo
fef2e4b6f4 feat(sf): add type-level scaffolding for progressive planning (ADR-011)
Three additive type changes that prepare SF to wire refine-slice
through the state machine. Pure type-level — no runtime behavior
change yet:

1. types.ts:14 — Phase union gains "refining" between "planning" and
   "evaluating-gates". State derivation will yield this when a slice
   has is_sketch=1 AND phases.progressive_planning=true.

2. types.ts:354 — PhaseSkipPreferences.progressive_planning?: boolean.
   Off by default; turning it on enables sketch→refine flow.

3. sf-db.ts:2321 — SliceRow.is_sketch?: number. Column DDL not yet
   added; this just lets the type compile when migration lands.

This is the smallest forward step toward closing the refine-slice gap
identified by sf-moojsmkg-72k3ei. Next steps (separate PRs):
- DB migration: ALTER TABLE slices ADD COLUMN is_sketch INTEGER NOT
  NULL DEFAULT 0 (mirroring gsd-2 sf-db.ts:381,1074)
- state.ts: derivation rule emit phase="refining" when sketch+flag
- auto-dispatch.ts: "refining → refine-slice" rule + import
  buildRefineSlicePrompt
- Tests: progressive-planning.test.ts equivalent

Existing buildRefineSlicePrompt + prompts/refine-slice.md already in
place — only the FSM path is missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:10:03 +02:00
Mikael Hugo
be4257b411 feat(sf): port refine-slice prompt from gsd-2
src/resources/extensions/sf/auto-prompts.ts:2143 buildRefineSlicePrompt()
already existed, calling loadPrompt("refine-slice", ...) — but the
template file was missing, so the function would throw if ever called.
gsd-2 has the prompt; ported with /gsd → /sf, .gsd/ → .sf/, GSD → SF,
gsd_plan_slice → sf_plan_slice, gsd_self_report → sf_self_report,
gsd/templates → sf/templates substitutions.

Verified end-to-end: loadPrompt("refine-slice", { ...vars }) succeeds
and produces a 5906-char rendered prompt with all 12 template variables
satisfied by renderSlicePrompt's existing var-passing.

This is a partial fix for sf-moojsmkg-72k3ei — the prompt now loads,
but full feature wire-up still requires:
- new state.phase value "refining"
- new preference phases.progressive_planning (gsd-2 only enables refine
  when this pref is true)
- dispatch rule "refining → refine-slice" in auto-dispatch.ts
- slice DB schema's sketch_scope already exists in the function body
  but downstream FSM transitions need wiring

Without those, buildRefineSlicePrompt is loadable but uncalled. Decision
needed: port the full FSM path or remove the unused builder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:03:56 +02:00
Mikael Hugo
c3ab4bfccf feat(sf): port 16 workflow templates from gsd-2
Adds 16 ready-to-use workflow templates that gsd-2 has but SF was
missing. Each runs via /sf workflow run <name> or /sf start <name>.

Markdown phased workflows (10):
- accessibility-audit  — UI a11y scan + remediation report
- api-breaking-change  — survey callers, migrate, deprecate, schedule removal
- changelog-gen        — release notes from git log since last tag
- ci-bootstrap         — minimal-working CI pipeline
- dead-code            — find unused functions/files (report only, no delete)
- issue-triage         — classify a GitHub issue + label/priority recommendation
- observability-setup  — structured logs, metrics, tracing
- onboarding-check     — walk README as new contributor, report gaps
- performance-audit    — measure → fix → measure
- pr-review            — structured code review of a PR
- pr-triage            — bucket open PRs (merge/close/nudge)
- release              — version bump → changelog → tag → publish (gated)

YAML-step iterators (4):
- docs-sync            — backfill JSDoc/TSDoc on undocumented exports
- env-audit            — inventory env vars + flag drift
- rename-symbol        — global rename across code/tests/docs
- test-backfill        — write unit tests for untested functions

All gsd-specific refs adapted: /gsd → /sf, .gsd/ → .sf/, gsd-build/gsd-2
→ singularity-forge/sf-run.

Templates need no SF-runtime tools (sf_*, subagent, browser_*) — they
run via the bash + git + gh/npm commands the agent already has.
Discovery verified: discoverPlugins() picks up all 27 templates
(11 existing + 16 new); registry.json is 1:1 with the .md/.yaml files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 19:01:51 +02:00
Mikael Hugo
955ee66614 fix(sf): replace 'Lex' personal name with generic 'project owner' in milestone-validation template
templates/milestone-validation.md:60 was instructing the validating agent
to add 'enough context for Lex to make a decision'. Lex is the
developer's personal nickname; bundled templates ship to every SF user
and other users would write validation reports referencing a stranger.

Now reads 'enough context for the project owner to make a decision' —
generic and accurate for any project.

Tree-wide grep for Lex/Mikael/Mikki across bundled resources now
returns zero personal-name references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:49:20 +02:00
Mikael Hugo
b0e1e9ae1b fix(sf): replace developer-machine paths with portable placeholders
Three bundled files referenced /home/mhugo/code/singularity-forge in
example commands and prompt templates. They ship to every SF install,
where /home/mhugo/code/ doesn't exist:

- workflow-templates/full-project.md: "defined in SF-WORKFLOW.md" was
  ambiguous (LLM resolves relative to cwd). Now points at the canonical
  ~/.sf/agent/SF-WORKFLOW.md install path (per loader.ts:236).

- skills/context-doctor/SKILL.md: Step 6 commit example used
  "cd /home/mhugo/code/singularity-forge". Generic "<project-root>"
  works for any user.

- skills/dispatching-subagents/SKILL.md: subagent task-prompt template
  hardcoded "Repo: /home/mhugo/code/singularity-forge" in the CONTEXT
  section. Same fix.

The acquiring-skills skill has more dev-specific content (mikki-bunker
host, /home/mhugo/code/, dev-tree copy paths) that's clearly a personal
workflow shipping in the bundled tree — left untouched here, needs a
real triage decision (delete from bundle vs generalize).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:46:20 +02:00
Mikael Hugo
64fcbf881e fix(skills): correct .claude/skills/gh/ → .claude/skills/github-workflows/references/gh/
The github-workflows skill bundles a sub-tree at references/gh/ that was
historically a standalone 'gh' skill. After it got nested inside
github-workflows, the docs and scripts kept the old install path:

  .claude/skills/gh/scripts/github_project_setup.py  (stale)

When this skill is installed (as 'github-workflows'), the actual path is:

  .claude/skills/github-workflows/references/gh/scripts/github_project_setup.py

Anyone copy-pasting an example uv run command from issue-stories.md,
milestones.md, labels.md, projects-v2.md, or the script's own help
output would hit ENOENT on the abbreviated path.

11 line replacements across 5 files (4 reference docs + 1 Python
script's own typer.echo).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:42:44 +02:00
Mikael Hugo
626813b616 fix(vectordrive): silence optional dependency warning 2026-05-02 18:42:25 +02:00
Mikael Hugo
22f22181db fix(sf): mark research summary saves terminal 2026-05-02 18:42:25 +02:00
Mikael Hugo
18643702b2 fix(sf): give workflow-templates/product-audit.md an absolute prompt path
Step 1 said "Load the audit prompt at \`prompts/product-audit.md\`".
That's a relative path the dispatched LLM would resolve against the
project's working directory — but \`prompts/product-audit.md\` doesn't
live in the user's project; it lives in the bundled extension copied
to \`~/.sf/agent/extensions/sf/prompts/\` (per prompt-loader.ts:50
__extensionDir/prompts).

LLMs running this workflow would either fail to find the file, walk
the filesystem looking for it, or skip the guidance silently. Now
points at the canonical location and clarifies that the prompt holds
evidence-collection guidance and output schema (the structured tool
sf_product_audit handles persistence).

Partially addresses sf-monzctqw-w4g85x — the path is now right; the
broader prompt-vs-hardcoded-tool design tension is left for a real
triage decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:39:38 +02:00
Mikael Hugo
a3ef4bdf3f fix(sf): remove workflow tool aliases 2026-05-02 18:32:50 +02:00
Mikael Hugo
1be11744ee fix(skills): update create-skill SKILL.md + workflows to canonical skill paths
After last fire fixed sf-skill-ecosystem.md, three more sites in the
create-skill skill were still teaching the legacy ~/.sf/agent/skills/
and .pi/agent/skills/ paths:

- create-skill/SKILL.md:91 quick reference
- create-skill/workflows/create-new-skill.md:18 (scope question)
- create-skill/workflows/create-new-skill.md:102 (Step 5 directory creation)
- create-skill/workflows/audit-skill.md:19,29 (skill enumeration ls commands)

Now point at the canonical four-directory ecosystem
(~/.agents/skills/, ~/.claude/skills/, plus project-local variants)
that the runtime actually scans (per skill-discovery.ts:16-17,
skill-telemetry.ts:34-35, preferences-skills.ts:39-43).

The audit-skill ls block now enumerates all four locations so the
audit report matches what SF will actually load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:32:19 +02:00
Mikael Hugo
effa8eade4 fix(skills): correct skill-directory paths in create-skill ecosystem reference
src/resources/skills/create-skill/references/sf-skill-ecosystem.md
documented skill paths that don't match what the SF runtime actually
scans:

- Doc said user-scope: `~/.sf/agent/skills/` and project-scope: `.pi/agent/skills/`
- Code (skill-discovery.ts:16-17, skill-telemetry.ts:34-35,
  skill-health.ts:240-241, skill-catalog.ts:1014-1015,
  preferences-skills.ts:39-43) actually scans:
  - User: `~/.agents/skills/` + `~/.claude/skills/`
  - Project: `<cwd>/.agents/skills/` + `<cwd>/.claude/skills/`

Anyone following the create-skill skill's reference doc would have
written skills to a path the runtime no longer actively reads —
`~/.sf/agent/skills/` is now legacy and only consulted if the
`.migrated-to-agents` marker is missing.

Also fixed:
- Telemetry path: said `~/.sf/metrics.json` (user-scope), actually
  `<project>/.sf/metrics.json` (project-scope per metrics.ts:665)
- Doctor command: said `/doctor`, actual command is `/sf doctor`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:29:17 +02:00
Mikael Hugo
ec235c8832 fix(sf): system.md names isolation field correctly as git.isolation
prompts/system.md:106 told agents the isolation mode lives in
PREFERENCES.md under `taskIsolation.mode`. The preferences validator
(preferences-validation.ts:84-88) explicitly REJECTS that key — along
with task_isolation and bare isolation — with the error
'use "git.isolation" instead'. The canonical field is git.isolation
(verified in PREFERENCES.md template line 22 and preferences.ts:897).

Anyone following the system-prompt instruction would write the wrong
config, the validator would discard it, and isolation would silently
fall back to default 'none'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:16:48 +02:00
Mikael Hugo
b046bc1687 chore: clean up remaining sf-2 stale-name code comments
Final sweep after the prompt + script + README sweep for stale repo
references. These are pure code comments, not active behavior, but they
mislead readers about what repo this code lives in:

- src/resource-loader.ts: "sf-2 repo's working tree" → "sf-run repo's"
- src/web/safe-import-meta-resolve.ts: example URL hostname
- src/resources/extensions/sf/schemas/parsers.ts: dropped "sf-2 /" prefix
- src/resources/extensions/sf/schemas/validate.ts: same
- scripts/parallel-monitor.mjs: comment about "sf-2 repo itself"

Tests intentionally not touched — the test fixtures use @sf-build as a
generic scope name to exercise the symlink-merge logic, and the test
tmpdir prefixes (sf-2821-, sf-2945-) are just numeric tags from issue
numbers, not repo refs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:14:22 +02:00
Mikael Hugo
56234b5131 fix(sf): canonicalize milestone id tool surface 2026-05-02 18:09:13 +02:00
Mikael Hugo
416eaf8d12 fix(sf): move add-tests.md skillActivation from dangling end to step 0
Same pattern fixed in scan.md last fire. The {{skillActivation}}
placeholder was the very last line of add-tests.md, after the
'Report sf-internal observations' section, so the default activation
sentence the prompt-loader injects landed where the agent only reads
it AFTER finishing test generation. Move to Instructions step 0 so
skills are activated before code reading begins.

Confirmed via sweep: no more prompts have a dangling {{skillActivation}}
at end-of-file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:08:55 +02:00
Mikael Hugo
ba4bab1034 fix(sf): correct stale .sf milestone paths in prompts + ADR-impl absolute links
prompts/parallel-research-slices.md step 3 told the dispatcher to verify
research at `.sf/{{mid}}/`, but slice research files actually live at
`.sf/milestones/{{mid}}/slices/<sliceId>/<sliceId>-RESEARCH.md`. Step 3
verification could only ever fail.

prompts/validate-milestone.md sent the three milestone-validation reviewer
agents to wrong paths:
- parentTrace pointed at `.sf/{{milestoneId}}/S0X-SUMMARY.md` (slice
  summaries actually live at `.sf/milestones/{{milestoneId}}/slices/S0X/`)
- Reviewer A read `.sf/{{milestoneId}}/REQUIREMENTS.md` (the file is at
  project-level `.sf/REQUIREMENTS.md`)
- Reviewer A scanned `.sf/{{milestoneId}}/` for slice SUMMARYs (wrong dir)
- Reviewer C read `.sf/{{milestoneId}}/CONTEXT.md` (actual file is
  `.sf/milestones/{{milestoneId}}/{{milestoneId}}-CONTEXT.md`)

Reviewers would either return false MISSING / FAIL verdicts or have to
re-discover the layout.

docs/dev/ADR-{008,009}-IMPLEMENTATION-PLAN.md "Related ADR" links pointed
to absolute paths inside a contributor's old Mac (`/Users/jeremymcspadden/
Github/sf-2/...`). Replaced with sibling-file relative paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:06:16 +02:00
Mikael Hugo
21113e18a9 fix: update remaining stale repo and scope refs to singularity-forge
After fixing forensics.md and error-classifier.ts last fire, swept the
rest of the tree for the same class of stale reference:

- scripts/validate-pack.js: criticalPackages list used \`@sf\` and
  \`@sf-build\` scopes — neither exists in node_modules; this is in CI
  (.github/workflows/ci.yml) + prepublishOnly, so the validation step
  was failing to find anything. Now \`@singularity-forge/pi-coding-agent\`
  and \`@singularity-forge/rpc-client\` (the actual scope).
- src/resources/skills/github-workflows/references/gh/SKILL.md: same
  GraphQL bug as forensics.md — owner:"sf-build" name:"sf-2" — and
  three \`gh project\` commands using owner sf-build. The gh issue
  create command above already used singularity-forge/sf-run, so the
  follow-up calls always failed. Also retitled "sf-2 Backlog" to
  "sf-run Backlog".
- src/resources/extensions/sf/bootstrap/system-context.ts: deprecation
  warning linked to https://github.com/sf-build/SF/issues/1492.
- packages/mcp-server/README.md, packages/rpc-client/README.md: 9 refs
  to \`@sf-build/...\` for installable package names — would mislead
  anyone copy-pasting into npm install.
- docs/user-docs/troubleshooting.md (+ zh-CN): GitHub Issues link
  pointed at github.com/sf-build/SF/issues.
- docs/user-docs/getting-started.md (+ zh-CN): clone URL was correct
  but the next \`cd\` was \`cd sf-2/docker\` — won't exist after a
  fresh clone of sf-run.
- docs/dev/ci-cd-pipeline.md: GHCR org was \`sf-build\`.

Code comments containing "sf-2" / "sf-build" in non-active places
(parsers.ts banner, error message URLs in tests, dev-doc absolute
paths from a contributor's Mac) left alone — they're informational
and not addressed by users or runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 18:01:55 +02:00
Mikael Hugo
65be8c7f16 fix(sf): update stale repo references to singularity-forge/sf-run
forensics.md: GraphQL queries used owner:"sf-build" name:"sf-2" while
the gh issue create command above them correctly used
--repo singularity-forge/sf-run. This meant /sf forensics could create
the issue but the follow-up calls to set issue type would silently fail
against a non-existent repo. Both GraphQL queries now match the canonical
singularity-forge/sf-run.

error-classifier.ts: doc-comment @see link pointed to the old
sf-build/sf repo URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:56:44 +02:00
Mikael Hugo
59a37c1080 fix(sf): move scan.md skillActivation from dangling end to Instructions step 0
The {{skillActivation}} placeholder was at the very bottom of scan.md,
after the 'Report sf-internal observations' section, with no header or
context. Since the default prompt-loader provides a one-sentence
'use the SF Skill Preferences block...' instruction, it landed as an
orphan footer the agent only encountered AFTER finishing the scan.

Move it to step 0 of the numbered Instructions so the agent activates
skills before exploring the codebase, matching the research-slice and
plan-milestone pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:53:54 +02:00
Mikael Hugo
61485c5bef fix(sf): remove legacy completion tool aliases 2026-05-02 17:51:38 +02:00
Mikael Hugo
1891ccbdcd chore(sf): delete orphaned commands-debug + debug-session-store
`/sf debug` was ported in 360208cba but never wired up:

- handleDebug exported but no caller anywhere in the tree
- not in commands/catalog.ts
- loadPrompt("debug-session-manager") and loadPrompt("debug-diagnose")
  referenced prompts that never existed in prompts/ — guaranteed
  runtime crash if the dispatch path were ever hit
- debug-session-store.ts only consumed by commands-debug.ts
- no tests reference any of it

887 LOC of dead code with a latent crash. Removing both files
eliminates the orphan-prompt callsite that gap-audit kept flagging
and the broken dispatch path. Resolves sf-moohvyzc-ll5bd0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:42:28 +02:00
Mikael Hugo
e07f2bc225 fix(sf): add depth calibration to research-milestone prompt
Mirror the tiered Deep/Targeted/Light breakdown that research-slice.md
already had — same structure, milestone-scoped wording. Add explicit
'## Steps' header so the numbered steps no longer flow visually out of
the calibration paragraph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:39:23 +02:00
Mikael Hugo
8032ee6144 fix(sf): gap-audit detects prompts loaded by direct filesystem read
Orphan-prompt detection only checked loadPrompt() callsites. Three
prompts (heal-skill, product-audit, review-migration) are loaded by
direct readFileSync of "<name>.md" — they got false-flagged as orphans.

Add a literal-filename check so any source file containing "<name>.md"
counts as a load. Cheap one-pass grep, same shape as the existing
loadPrompt patterns.

Verified with live runGapAudit: 0 new findings (was previously logging
the 3 false positives every session_start).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 17:29:44 +02:00