singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	df095b406a	docs(sf): disambiguate "ADR-011" — comments now say "gsd-2 ADR-011" A future maintainer reading "ADR-011 Phase 2" in escalation.ts would look up SF's local docs/dev/ADR-011 and find "Swarm Chat and Debate Mode" — totally unrelated. The escalation + progressive-planning work ports gsd-2's ADR-011 (Progressive Planning + Escalation), which happens to share the number with our local ADR-011. Prefixed every internal comment that referenced the gsd-2 ADR with "gsd-2 ADR-011" so the source-of-truth lookup is unambiguous. Comment- only diff — no compilation, runtime, or test surface affected. Files: types.ts, auto-prompts.ts, auto-dispatch.ts, escalation.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:06:34 +02:00
Mikael Hugo	b6bdbe586a	docs(sf): align refine-slice "Autonomous execution" footer with siblings The autonomous-mode footer in refine-slice.md was the short version ("Document assumptions in the plan") while plan-slice / execute-task / complete-slice all carry the full explanation: agents are in auto-mode, no human is available, document assumptions in the artifact, note human-input-required decisions in the relevant artifact and proceed with the best available option. Refine-slice gets sketches refined into full plans — same autonomy contract as plan-slice. Aligning the language so an agent reading any of these prompts gets the same self-help instructions about ask_user_questions / secure_env_collect. Markdown-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:01:44 +02:00
Mikael Hugo	16cf479781	docs(sf): surface SF_LLM_GATEWAY_* env vars in PREFERENCES template These are runtime-only settings (not YAML keys), and the previous template mentioned only the YAML phase toggles. Operators discovering the embedding/rerank surface had to read source. Adding a clear table at the bottom of PREFERENCES.md so the env-var contract is documented next to the rest of the skill prefs. Documents: SF_LLM_GATEWAY_KEY, SF_LLM_GATEWAY_URL, SF_LLM_GATEWAY_EMBED_MODEL, SF_LLM_GATEWAY_RERANK_MODEL — including the silent-fallback semantics and the agent_end backfill cadence. Markdown-only; no recompile needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 23:00:15 +02:00
Mikael Hugo	8299c7ac2b	fix(sf): clear last 2 stale failures from gsd-2 compat sweep auto-session-encapsulation invariant: the parallel session refactored auto.ts to use the getAutoSession() factory; the test still expected `new AutoSession()` literally. Updated the regex + the allowedPatterns list to accept both shapes — the invariant is "exactly one module-level binding for the AutoSession instance", not which constructor expression yields it. silent-catch-diagnostics #3348: auto-supervisor.ts:53 swallowed signal- handler exceptions silently. Added logWarning("session", ...) — the intent stays the same (signal handler must not throw), but cleanup-path errors are now visible in the journal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:51:42 +02:00
Mikael Hugo	3e8c5b192f	fix(sf): add sf-dev batch server command	2026-05-02 22:44:14 +02:00
Mikael Hugo	c9609459e4	fix(daemon): --verbose actually lowers log level + reports effective level --verbose was wired only to the stderr-mirror path. Debug entries got filtered by Logger.level (default 'info' from config) before reaching the mirror — so passing --verbose produced almost no extra output, which made it look broken on a fresh start. Now --verbose lowers the level to 'debug' AND mirrors. Logger exposes `effectiveLevel` so the "daemon started" banner reports what the logger is actually using, not what was in the config file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:41:48 +02:00
Mikael Hugo	7bec2dc2d0	fix(sf): invalidate stale embedding when memory content is updated updateMemoryContent rewrote the row but left the existing memory_embeddings vector in place — that vector was computed against the old content, so the next cosine query would score the memory by what it used to say, not what it says now. Now drop the embedding row on update; the next runEmbeddingBackfill (agent_end hook) re-embeds. Best-effort: a missing embedding is the silent-fallback case the ranker already handles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:38:24 +02:00
Mikael Hugo	a3c000de26	fix(sf): close 6 stale test failures from gsd-2 compat sweep Schema-version assertions hadn't been bumped past 21 in three places (complete-task/complete-slice/md-importer); manifest coverage tests caught the project-scoped unit types added for the deep planning gate (ADR-011) that weren't yet registered in either KNOWN_UNIT_TYPES table; workflow- templates registry test rejected docs-sync.yaml because the assertion was .md-only. - preferences-types.ts: KNOWN_UNIT_TYPES gains refine-slice, discuss-project, discuss-requirements, research-project, workflow-preferences. - unit-context-manifest.ts: same five types added to its local KNOWN_UNIT_TYPES + UNIT_MANIFESTS (TOOLS_PLANNING, scoped/full knowledge, COMMON_BUDGET_MEDIUM/LARGE). - complete-task / complete-slice / md-importer test: schema_version expectation 21 → 25. - workflow-templates test: file extension can be .md OR .yaml (docs-sync is intentionally yaml-step iteration). 6 test files / 81 tests now green that were red. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:35:26 +02:00
Mikael Hugo	3f213f3131	fix(sf): run sf-server from source in dev	2026-05-02 22:34:42 +02:00
Mikael Hugo	974d8e4b6d	fix(sf): expose daemon as sf-server	2026-05-02 22:25:24 +02:00
Mikael Hugo	e5787794f3	feat(sf): /sf memory search — embedding-ranked memory query New subcommand: /sf memory search "<query>". Routes through getRelevantMemoriesRanked, so when SF_LLM_GATEWAY_KEY is set the gateway embeds the query and ranks memories by cosine + static blend; without the key, gracefully degrades to static ranking. Header text indicates which path was taken so users know whether embeddings are live. This makes the embedding pipeline operator-discoverable — previously the only consumer was the silent execute-task injection path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:22:33 +02:00
Mikael Hugo	eb5f7ef7b6	feat(sf): query-aware memory ranking — embeddings now actually matter Previous commit populated memory_embeddings rows but no consumer read them — the read path (getActiveMemoriesRanked) used pure static score (confidence × hit_count). Embeddings were silent. This wires the read side: - rankMemoriesByEmbedding (pure, in memory-embeddings.ts) blends static score with cosine similarity: combined = static * (1 + α * cosine). Defaults α=0.6 — a perfect-static + zero-similarity hit ties roughly with a low-static + perfect-similarity hit, so semantically relevant cold memories can surface above stale-but-popular ones. - embedQueryViaGateway + loadEmbeddingMap — supporting helpers. - getRelevantMemoriesRanked (memory-store.ts) — async query-aware ranker. Oversamples the static pool 5×, embeds the query, blends, returns top-K. Falls back cleanly to static ranking when: - query empty - no SF_LLM_GATEWAY_KEY (gateway not configured) - gateway request fails (500/network) - no embeddings exist yet (fresh DB / worker offline) - auto-prompts.ts: execute-task injection now uses sliceTitle + taskTitle as the query so memories relevant to the current work surface first. 10 new tests lock the contract — pure ranker math, fallback chain, and the gateway-mocked promotion case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:18:45 +02:00
Mikael Hugo	56ee89a946	feat(sf): live embeddings via inference-fabric llm-gateway + auto-backfill Adds an opt-in embedding path against `https://llm-gateway.centralcloud.com/v1` using qwen/qwen3-embedding-4b. Activated by exporting SF_LLM_GATEWAY_KEY; URL/model overridable via SF_LLM_GATEWAY_URL and SF_LLM_GATEWAY_EMBED_MODEL. Rerank surface present (SF_LLM_GATEWAY_RERANK_MODEL) but degrades to null when no rerank worker is online — current gateway has none, so it stays dormant until one comes up. - memory-embeddings-llm-gateway.ts: createGatewayEmbedFn + rerankCandidates speaking the OpenAI-shaped /v1/embeddings and /v1/rerank protocols. - memory-embeddings.ts: listUnembeddedMemoryIds + runEmbeddingBackfill — best-effort sweep, in-flight-guarded, bounded, throttled "unavailable" log. Wired into agent_end so every turn opportunistically embeds new memories when the gateway is reachable. - sf-db.ts: pre-existing bug fix — memory_embeddings, memory_relations, and memory_sources were referenced everywhere but never CREATE-d in the schema. Adding them as IF NOT EXISTS with proper FK + PK so fresh DBs actually work. - 16 new tests covering env config, embed fn shape, rerank degradation, backfill happy/sad/bounded paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:13:23 +02:00
Mikael Hugo	dd126ddc8b	fix(sf): recover model routes and self-feedback	2026-05-02 22:07:10 +02:00
Mikael Hugo	c308a492d7	chore(sf): differentiate auto-accepted vs user-resolved escalations in audit resolveEscalation gains an optional `source: "user" \| "auto-mode"` parameter (default "user"). Auto-dispatch passes "auto-mode" when it auto-accepts. The UOK audit event type now flips between "escalation-user-responded" and "escalation-auto-accepted", and the payload includes a typed `resolvedBy` field. Why: a journal grep for user actions shouldn't return auto-mode events. Audit/observability tools can now filter cleanly without string-matching the rationale prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:59:38 +02:00
Mikael Hugo	00c13bc5a1	feat(sf): persist escalation resolutions as durable memories When an escalation is resolved (auto-mode accept or user override), write the choice + rationale into the memories table with category="architecture". The "[escalation:<task>] <question>. Chose: <option>. Rationale: ..." prefix mirrors the decisions->memories backfill format so search and de-duplication work the same way. Why: getActiveMemoriesRanked auto-injects top memories into every execute-task prompt, so a resolved escalation now travels forward as implicit context across the whole project — not just the immediate carry-forward into the next task. The artifact JSON stays as the audit trail; the memory is the discoverable, semantically-ranked surface. Best-effort write — never blocks resolution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:53:56 +02:00
Mikael Hugo	7c6140517e	fix(sf): surface escalation write failures back to the agent When sf_task_complete's escalation payload was rejected (validation error) or silently dropped (feature flag off), the agent saw a clean "Completed task" response and assumed the issue was raised — but no carry-forward override was created, so the next executor saw nothing. Now the response text explicitly says: - "WARNING: escalation payload was REJECTED (<error>); the next executor will NOT see your decision" — when buildEscalationArtifact throws - "note: escalation payload was DROPPED because phases.mid_execution_escalation is disabled" — when feature flag is off Task completion is still never blocked by escalation issues — additive, auditable, agent-actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:48:35 +02:00
Mikael Hugo	b79ebbf10a	fix(sf): generalize M008 leak in systematic-debugging skill The global skill hardcoded `.sf/milestones/M008/bugs/bug-registry.json` and `M008-specific:` rules — when M008 closes the skill goes stale and misleads agents on every other milestone. Reframed as "Milestone Bug Registry Guidance": the rules apply to any milestone that ships a `bug-registry.json` + `triage-protocol.md` pair, with M008 cited as the canonical example for the registry test. When no registry exists, the section is skipped — agents follow the normal evidence/repro/fix flow. triage-protocol-registry test (31 tests) still passes — keeps the literal `bug-registry.json` reference and HIGH/MEDIUM/LOW + cluster + update-after-fix assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:44:08 +02:00
Mikael Hugo	08859624f8	feat(sf): teach executor about the escalation field on sf_task_complete The escalation feature was invisible to agents — the prompt didn't say it existed, so agents made silent assumptions instead of surfacing genuine tradeoffs. Now, when phases.mid_execution_escalation is on, execute-task includes a guidance block showing the escalation payload shape and noting auto-mode auto-accepts the recommendation by default. When the feature is off the field is silently dropped, so the guidance is omitted entirely to avoid misleading the agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:41:38 +02:00
Mikael Hugo	3895ae2cd3	feat(sf): auto-mode is autonomous — escalations auto-accept by default Auto is autonomous, so the escalating-task dispatch rule shouldn't halt the loop. Default: accept the agent's recommendation, record the choice with `auto-mode: ...` rationale, and let the next dispatch cycle pick up the carry-forward override. Users can review or override via `/sf escalate list --all` later. Set `phases.escalation_auto_accept: false` to keep gsd-2's pause-and-ask behavior (loop halts until the user runs `/sf escalate resolve`). - types.ts: add escalation_auto_accept (default true) - preferences-validation.ts: allowlist + warn on unknown phase keys - auto-dispatch.ts: rename rule to "auto-accept-or-pause"; on auto-accept resolve via resolveEscalation("accept", ...) and return action:"skip" so the next cycle re-reads state cleanly - PREFERENCES.md: surface the toggle with the autonomy rationale - tests/escalation-auto-accept.test.ts: 4 cases — default accept, explicit true, explicit false (preserves pause), non-escalating phase no-op Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:36:15 +02:00
Mikael Hugo	0f0aee5bf0	feat(sf): port 3 gsd-2 DB helpers + improve /sf escalate list Three small DB helpers from gsd-2 that SF was missing, plus a UX improvement to /sf escalate list that uses one of them. PDD spec: setSliceSketchFlag(milestoneId, sliceId, isSketch) — generalized sketch-flag setter. Replaces my narrower clearSliceSketch (which remains as a thin wrapper for callers that only zero). Use this when a re-plan flow wants to revert a slice back to sketch state. autoHealSketchFlags(milestoneId, hasPlanFile) — safety net for progressive planning. Predicate-based: caller passes a function that resolves whether a PLAN file exists for a slice, function flips is_sketch=0 for any slice that has both is_sketch=1 AND a plan file. Catches DB-FS drift after crashes/manual edits. listEscalationArtifacts(milestoneId, includeResolved=false) — cross-slice DB-side filter for /sf escalate list. Replaces my hand-rolled inner-loop over getMilestoneSlices() + getSliceTasks() + filter — single SQL query, sorted by sequence, faster. UX improvement to commands-escalate.ts: - /sf escalate list: now uses listEscalationArtifacts; shows PENDING / awaiting-review / resolved status badges per entry. - /sf escalate list --all: includes resolved entries (audit trail). - Better hint message when none active: 'Use --all to include resolved'. Verified: - typecheck clean (one parallel-session-introduced error in self-feedback-drain.ts is unrelated — they import a missing utils/error.ts; will land when their commit does). - escalation-feature.test.ts (21 tests) + sf-db.test.ts (16 tests) still pass — no regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:22:02 +02:00
Mikael Hugo	82633b6f5e	feat(sf): sf-audit-traces workflow for slow self-improvement loop A standalone agent prompt that reads SF's observability sources (self-feedback / journal / activity / judgments / forensics) and files AT MOST 3 recurring-pattern findings via sf_self_report so they enter the existing triage flow. PDD spec: Purpose: continuous self-improvement loop. SF already has the data sources (self-feedback.jsonl, journal/, activity/, judgments/) and the consumer pattern (triage-self-feedback → requirement-promoter). What was missing: a standalone prompt that pulls those sources together for a scheduled run. Consumer: agents invoked via '/schedule every morning sf-audit-traces' (cloud) or '/sf workflow run sf-audit-traces' (manual). Contract: 1. Snapshot the trace volumes (file counts + line counts) into evidence so reports are concrete, not prose. 2. Bar = 3+ occurrences. Single events go to operator eyeballs, not permanent self-feedback entries. 3. Hard cap of 3 entries per run. The whole point is slow iteration — the triage queue is human-paced, not a firehose. 4. NEVER auto-apply. Even if the fix looks one-line obvious, file and stop. The triage flow decides what becomes work. 5. Zero findings is a successful run when the system is healthy. Failure boundary: missing source files → skip silently. Read errors → handle gracefully. Never block on absence. Evidence (verified during scan before writing): - 181 self-feedback entries (55 open, 126 resolved) - Top open kinds: runaway-guard-hard-pause (4), git-stage-failure (2), context-injection-gap (2), orphan-prompt (2) - Journal: 6-233 events per active day - Activity logs: per-unit JSONL transcripts present - All sources accessible via plain file reads — no special tools. Non-goals: - ML training on traces - Cross-project trace aggregation - Auto-applying fixes (triage flow already does that) - Fast iteration (deliberately slow — 3/run cap means at most 21 new triage items per week even with daily runs) Invariants: - Safety: agent never edits code/prompts/templates/docs. - Liveness: zero findings is a valid output. The agent doesn't fabricate patterns to justify a run. Discovery verified: 28 total workflow templates after this commit (was 27); plugins.get('sf-audit-traces') returns the plugin from the bundled source. Pairs with: triage-self-feedback (reads what this files), requirement-promoter (auto-promotes recurring kinds to requirements), self-feedback-drain (session-start drain into repair turns). The audit is the IN end of that pipeline; the rest of SF was already the OUT end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:15:13 +02:00
Mikael Hugo	e381e3c8ad	fix(sf): bump SCHEMA_VERSION to 25 + update sf-db.test.ts assertion The migrate gate `if (currentVersion >= SCHEMA_VERSION) return;` was short-circuiting at 23, leaving the v24 (escalation_awaiting_review) and v25 (escalation_override_applied) migrations unreached on fresh databases. Test caught it: 'fresh DB schema init (memory)' expected MAX(version)=23 then expected 25 after my test bump, both kept returning 23 because the migrate function bailed before the new ensureColumn calls. Two-line fix: - sf-db.ts:133 SCHEMA_VERSION 23 → 25 - sf-db.test.ts:88 + :222 expected version 23 → 25 Now fresh DBs run all migrations through v25 and end at the latest version. Existing databases with version 24 still get v25 applied because currentVersion < SCHEMA_VERSION (24 < 25). 37/37 tests pass (sf-db + escalation-feature suites). No regression in the broader 127-test smoke suite that ran before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:05:06 +02:00
Mikael Hugo	aa67c1453c	test(sf): full lifecycle coverage for ADR-011 P2 escalation feature 21 vitest tests covering the entire escalation chain shipped this session. Each contract claim from prior PDD specs gets at least one verifying test: buildEscalationArtifact validation (4) - option count outside [2,4] → throws - duplicate option ids → throws - recommendation referencing unknown id → throws - happy path → version=1, taskId set, ISO createdAt writeEscalationArtifact + DB flag flips (3) - continueWithDefault=false → escalation_pending=1 - continueWithDefault=true → escalation_awaiting_review=1 - two writes flip the pair atomically (mutually exclusive) detectPendingEscalation (4) - empty slice → null - paused task → returns task id - awaiting_review tasks DO NOT pause - resolved (respondedAt set) tasks DO NOT pause resolveEscalation (5) - 'accept' selects recommendation - explicit option id resolves with userRationale persisted - invalid choice → status=invalid-choice with valid list - re-resolve → already-resolved - unknown task → not-found claimOverrideForInjection carry-forward (5) - no escalation → null - pending (unresolved) → null - resolved → returns block + sourceTaskId + sets DB flag=1 - second claim → null (race-safe idempotent) - clearTaskEscalationFlags preserves artifact path (audit trail) Provides regression protection for the full producer→consumer→ resolution→carry-forward path. All 21 pass against current head. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:56:12 +02:00
Mikael Hugo	125496ce36	docs(sf): surface ADR-011 toggles in PREFERENCES.md template Three new options got wired this session but the bundled template didn't mention them, so users had no discoverable way to know they existed. Adds them as commented hint fields: - phases.progressive_planning — sketch→refine slice planning - phases.mid_execution_escalation — task agents can pause for user decision via sf_task_complete escalation payload + /sf escalate - planning_depth (top-level) — 'deep' enables project-level discussion gate before any milestone work All three default off (commented out / unset) so existing users see zero behavior change from this template update; enabling any of them is a single uncomment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:53:40 +02:00
Mikael Hugo	4b6eb86b84	feat(sf): carry-forward injection — final piece of escalation feature (PDD) Replaces the claimOverrideForInjection stub with a real race-safe implementation. With this commit, the full escalation loop is wired: agent escalates → user pauses → user resolves → next executor in the slice sees the user's choice as a hard constraint in its prompt. The buildExecuteTaskPrompt call site at auto-prompts.ts:2452-2469 already invoked claimOverrideForInjection (gated on phases.mid_execution_escalation). Before this commit it was a no-op because the function returned null unconditionally. Now it actually delivers the override block. PDD spec for this change: Purpose: complete the loop. Without carry-forward, the loop 'continues' but the next executor re-encounters the same ambiguity that triggered the escalation. Consumer: buildExecuteTaskPrompt in auto-prompts.ts (already wired). Contract: 1. No resolved-but-unapplied override in this slice → returns null. Existing behavior preserved when no escalation pending. Verified. 2. Pending escalation (no respondedAt) → returns null. Caller's pause-detection layer handles those. Verified. 3. Resolved escalation (respondedAt + userChoice set) → atomically marks escalation_override_applied=1 (race-safe via UPDATE … WHERE applied=0) and returns formatted markdown block with sourceTaskId. Verified. 4. Second claim on the same override → null (race loser or already-applied). Verified. 5. Missing/malformed artifact → logWarning + null without claiming (so the row isn't silently swallowed by an applied=1 flip). Failure boundary: - claimEscalationOverride is the atomic boundary. Either you claim it and it's yours forever, or someone else did and you skip. - Validation BEFORE claim — bad artifact never marks the row applied. - DB unavailable in claimEscalationOverride → returns false → caller treats as race-loser → null. Safe. Evidence: - Smoke test exercises 4 contract conditions: no-override → null pending-only → null resolved-then-claim → returns block + sets DB flag second-claim → null (idempotent) - Typecheck clean. - All 62 existing preferences tests still pass (no regression in the related plumbing). Non-goals: - reject-blocker carry-forward (gsd-2 has it; needs blocker_source DB column SF doesn't have). - Cross-slice override carry-forward (current scope: per-slice). - Override-applied audit event (gsd-2 emits one; can add later). Invariants: - Safety: applied flag is set BEFORE the prompt is built — so a crash mid-build never re-injects on retry. - Liveness: any task in the slice with a resolved override gets surfaced in sequence order (lowest sequence first via findUnappliedEscalationOverride's ORDER BY). - Race-safety: SQL UPDATE … WHERE applied=0 returns changes>0 only for the winner. Tested with sequential claims; both winners and losers behave correctly. DB schema: tasks.escalation_override_applied (INTEGER NOT NULL DEFAULT 0), migration v25. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:51:56 +02:00
Mikael Hugo	2c044f340f	feat(sf): auto-fill empty model fallbacks from benchmark picker (PDD) Closes the gap that left the user's session paused on a quota error with no fallback to switch to. Before this commit: - User pins models.execution: { model: gemini-3-flash-preview } - No fallbacks array → resolveModelWithFallbacksForUnit returns { primary, fallbacks: [] } - agent-end-recovery.ts line 348 checks fallbacks.length > 0 → false - Loop pauses on the first rate-limit, even though the user has other API-keyed providers available. After: an empty/missing fallbacks array auto-fills from resolveAutoBenchmarkPickForUnit (which picks API-keyed candidates ranked by benchmark scores), excluding the user's pinned primary so we never get a no-op switch to the same model. PDD spec: Purpose: out-of-the-box auto-switch to fallback models when a user pins only a primary. Matches user expectation that 'the system selects models automatically' when keys are available. Consumer: agent-end-recovery.ts model-fallback flow on rate-limit. Contract: 1. models.<unit>: '<id>' (string, no fallbacks) → primary plus auto-filled fallbacks. Unchanged primary, fallbacks excluding primary. 2. models.<unit>: { model: '<id>', fallbacks: ['a', 'b'] } (explicit non-empty) → unchanged. User intent respected. 3. models.<unit>: { model: '<id>' } (object, no fallbacks) → auto- fill from benchmark picker. 4. models.<unit>: { model: '<id>', fallbacks: [] } (explicit empty) → auto-fill (treat empty same as missing). 5. No models config at all → unchanged behavior — full auto-pick. Failure boundary: - resolveAutoBenchmarkPickForUnit returns undefined when no API-keyed providers exist → fallbacks stays empty (no candidates to switch to anyway). - autoBenchmark option still honored — set to false to opt out. Evidence: - Smoke test: pinned 'gemini-3-flash-preview' with empty fallbacks + OPENROUTER_API_KEY + GEMINI_API_KEY in env → returns 4 fallbacks starting with minimax/MiniMax-M2.7. Primary not in fallbacks. - Existing 62 preferences tests + 5 rate-limit-model-fallback tests still pass — no regression. Non-goals: - Cross-phase inheritance (planning falls back to execution config). - Persisting auto-filled fallbacks to PREFERENCES.md. - Mid-tool-call rate-limit recovery (different code path through pi-coding-agent's RetryHandler). Invariants: - Safety: explicit non-empty user fallbacks NEVER overwritten — line userFallbacks.length > 0 short-circuits before auto-fill. - Liveness: empty arrays trigger auto-fill, so callers get a chain if any keys are configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:43:28 +02:00
Mikael Hugo	e4a86ddf6f	fix(sf): classify 'exhausted your capacity / quota will reset after Ns' as rate-limit Real failure caught from a user session: provider returned 'Error: You have exhausted your capacity on this model. Your quota will reset after 51s.' SF's classifier didn't match it (no 'rate limit', no '429', no 'limit resets'), so it fell through to unknown → no auto-resume → loop paused indefinitely until manual /sf autonomous restart. PDD spec: Purpose: every legitimately transient quota error should auto-resume after the named cooldown, not pause indefinitely. Consumer: classifyError() callers, ultimately the auto-loop. Contract: - 'exhausted your\|the (quota\|capacity\|usage)' → rate-limit - 'quota will reset' → rate-limit (paired with the above) - 'will reset after Ns' / 'will reset in Ns' → retryAfterMs = N*1000 Failure boundary: parse failure → 60s default (preserved). Evidence: smoke test with 6 inputs: ✅ 'exhausted your capacity ... will reset after 51s' → rate-limit/51000 ✅ 'rate limit exceeded' → rate-limit/60000 (unchanged) ✅ 'Internal server error' → server/30000 (unchanged) ✅ '429 too many requests' → rate-limit/60000 (unchanged) ✅ 'Invalid API key' → permanent (unchanged — still manual) ✅ 'exhausted the usage. Will reset in 30s.' → rate-limit/30000 Non-goals: model-fallback-on-rate-limit (separate change — the provider-error-pause module currently waits and retries the same model; switching to the configured fallback model after the first rate-limit hit is a richer policy change). Invariants: - Permanent classification still wins when no rate-limit pattern is present (auth/billing/invalid-key untouched). - Default 60s delay preserved when reset-time can't be parsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:35:55 +02:00
Mikael Hugo	f757a18417	feat(sf): /sf escalate user command + resolveEscalation (PDD) Closes the user-facing loop for ADR-011 P2. The full escalation end-to-end now works: agent files → loop pauses → user resolves via /sf escalate → loop continues. PDD spec for this change: Purpose: let the user resolve a paused task escalation. Without this, escalation_pending=1 has no exit ramp other than manual SQL. Consumer: users at the prompt — '/sf escalate list', '/sf escalate show <slice>/<task>', '/sf escalate resolve <slice>/<task> <choice> [-- <rationale>]'. Contract: 1. /sf escalate list → enumerate pending escalations in the active milestone, showing slice/task, question, options, recommendation. 2. /sf escalate show <slice>/<task> → print the artifact's question + options with tradeoffs + recommendation + resolution status (resolved or unresolved). 3. /sf escalate resolve <slice>/<task> <option-id> [-- <rationale>] → resolveEscalation in escalation.ts: - 'accept' selects the recommended option - any option id from the artifact is also valid - invalid choice → returns 'invalid-choice' with valid list - already resolved → 'already-resolved' with prior timestamp - not found → 'not-found' with the task path On success: artifact gains respondedAt/userChoice/userRationale, DB flags cleared, UOK audit event 'escalation-user-responded' emitted. Failure boundary: - DB unavailable → 'SF database is not available. Run /sf doctor.' - Active milestone missing → 'No active milestone — nothing to list.' - Malformed artifact path → readEscalationArtifact returns null → handler returns 'not-found'. - clearTaskEscalationFlags called inside the resolver — never leaves the row in a half-resolved state. Evidence: smoke test exercises 4 contract conditions end-to-end: invalid-choice, accept→resolved (chosen option = recommendation), already-resolved on re-run, not-found for unknown task. Typecheck clean. Non-goals: - reject-blocker choice (gsd-2 has it; needs a blocker_source DB column SF doesn't have) - Carry-forward injection (claimEscalationOverride — findUnappliedEscalationOverride flow). The override is logged in the artifact for the user; agent context injection lands when the executor's prompt builder is wired to read it. - Cross-milestone listing (current implementation: active milestone only — matches /sf escalate list's most useful default behavior). Invariants: - Safety: invalid-choice and not-found return without writing — no half-state. - Safety: clearTaskEscalationFlags zeros pending+awaiting in one UPDATE — reader can never see half-cleared state. - Liveness: after resolve, next state derivation cycle sees escalation_pending=0 → phase != 'escalating-task' → dispatch routes normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:31:45 +02:00
Mikael Hugo	2bf6c51fde	feat(sf): expose escalation via sf_task_complete (PDD) Closes the agent surface for ADR-011 P2. Task agents can now include an optional 'escalation' payload on sf_task_complete, gated by phases.mid_execution_escalation. When the preference is on and the field is present, the executor builds and writes the artifact, which flips tasks.escalation_pending or escalation_awaiting_review based on continueWithDefault. The producer chain from `14efcd773` is now agent-callable. PDD spec for this change: Purpose: give task agents a way to file a mid-execution escalation through the same tool they already call to record completion. No new tool surface — escalation rides as an optional field on sf_task_complete (matches gsd-2's design intent). Consumer: task agents (execute-task) when they hit ambiguity that requires user judgment. Contract: 1. phases.mid_execution_escalation !== true → escalation field silently ignored, current behavior preserved. Verified. 2. preference on + escalation field → buildEscalationArtifact validates, writeEscalationArtifact persists, DB flag set, result text + details report path + status. Verified. 3. continueWithDefault=false → status='pending' (loop pauses). continueWithDefault=true → status='awaiting-review' (no pause). 4. Escalation write failures are caught — task completion never blocks on an escalation error (logged via logError). Failure boundary: - Validation errors from buildEscalationArtifact propagate as caught try/catch in the executor → logged → task still completes. - Preference loader fails → behaves as if preference is off. - DB write failures fall through; the task is already recorded. Evidence: smoke test exercises both preference states (on writes artifact + sets flag; off silently ignores). Typecheck clean. Existing sf_task_complete callers without an escalation field see zero change in result shape or behavior. Non-goals: - resolveEscalation (apply user's choice → carry forward as override) — bigger flow, later fire. - listActionableEscalations / listAllEscalations — for /sf escalate list, later fire. - /sf escalate user command (later fire). Invariants: - Safety: escalation field is Optional in the schema; no caller is forced to migrate. - Liveness: build+write happen synchronously after handleCompleteTask returns; on success, the next state-derivation cycle picks up pending=1 and pauses. Schema additions to preferences-validation.ts: - mid_execution_escalation, progressive_planning recognized as valid phases keys (previously typed in PhaseSkipPreferences but silently stripped by the validator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:24:04 +02:00
Mikael Hugo	e82e878eaa	fix(uok): write parity exit heartbeat on SIGTERM/SIGINT before process.exit The signal handler in auto-supervisor.ts called process.exit(0) directly, bypassing the finally block in runAutoLoopWithUok() that writes the UOK parity exit heartbeat. This caused 55+ missing exit events in the parity log (78 enters vs 22 exits), making the enter/exit mismatch report meaningless. Changes: - auto-supervisor.ts: add optional onSignal callback to registerSigtermHandler, invoked before process.exit(0) with best-effort error swallowing - auto.ts: wrapper now passes a callback that writes the UOK parity exit heartbeat + refreshes the parity report before the hard exit - auto-start.ts: update BootstrapDeps interface to accept optional onSignal - tests: add 2 tests verifying callback invocation and error swallowing Fixes the UOK parity critical mismatch reported in uok-parity-report.json.	2026-05-02 20:20:40 +02:00
Mikael Hugo	14efcd7734	feat(sf): producer side of mid-execution escalation (PDD) Closes the producer half of ADR-011 P2. With this commit a task agent can call buildEscalationArtifact + writeEscalationArtifact and the escalation goes end-to-end: artifact persisted to disk, DB flag set, state derivation picks it up, dispatch returns 'stop'. PDD spec for this change: Purpose: let a task agent file an escalation when it hits a decision the user must make (overwrite vs fail, model A vs model B, etc.) rather than continue past undocumented ambiguity. Consumer: future sf_task_escalate tool, and direct callers of escalation.ts (e.g., resolve-time DB tools). Contract: 1. buildEscalationArtifact validates options (2-4 entries, unique ids, recommendation must reference a real option id) and throws a descriptive Error before any IO. Verified via smoke test: unknown recommendation id → "is not one of the option ids: …" 2. writeEscalationArtifact atomically writes the JSON to .sf/milestones/{M}/slices/{S}/tasks/{T}-ESCALATION.json, auto-creating the tasks/ subdirectory. 3. continueWithDefault=false → setTaskEscalationPending → loop pauses on next dispatch (verified end-to-end). 4. continueWithDefault=true → setTaskEscalationAwaitingReview → loop continues; artifact recorded for human review later (verified — detectPendingEscalation returns null for awaiting). 5. clearTaskEscalationFlags zeros both pending+awaiting but preserves escalation_artifact_path so the audit trail survives. 6. Emits a UOK audit event 'escalation-manual-attention-created' with traceId 'escalation:{M}:{S}:{T}' for cross-system trace. Failure boundary: - Validation throws BEFORE any DB or FS write — partial state impossible. - resolveSlicePath returns null when the slice doesn't exist; writeEscalationArtifact throws with a clear /sf doctor hint. - atomicWriteSync is the same temp+rename pattern used by every other SF artifact write. Evidence: - typecheck clean - smoke test exercises all 7 contract conditions end-to-end (build, write, pending detection, awaiting-review skip, clear, validation rejection, audit trail traceId) Non-goals: - sf_task_escalate MCP tool registration (separate fire — small, just exposing buildEscalationArtifact+writeEscalationArtifact via the tool surface). - resolveEscalation (apply user's choice → clear flags → carry forward as override) — bigger; later fire. - listActionableEscalations / listAllEscalations helpers — for /sf escalate list, later fire. - /sf escalate user command itself. Invariants: - Safety: builder validates BEFORE writer commits anything. The two phases never partially succeed. - Liveness: the two flags are mutually exclusive (set helpers flip both atomically in one UPDATE) — no state where both 1. DB schema gains escalation_awaiting_review column (v24 migration). The two helpers setTaskEscalationPending and setTaskEscalationAwaitingReview write the mutually-exclusive flag pair in one UPDATE so a reader can never observe both = 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:16:15 +02:00
Mikael Hugo	a558ff6c64	feat(sf): dispatch pause-for-escalation rule (PDD) Closes the basic escalation loop. With this commit, end-to-end: - Task agent writes escalation_pending=1 + escalation_artifact_path to the tasks DB row (DB schema from `62dacb627`). - State derivation detects the pause and emits phase='escalating-task' with /sf escalate hint in nextAction (`ea8819906`). - Auto-dispatch sees phase='escalating-task' FIRST in the rule order and returns 'stop' with the nextAction message — no other rule runs. PDD spec: Purpose: never let the loop continue past a pending escalation. Consumer: auto-mode dispatcher (DISPATCH_RULES first entry). Contract: 1. state.phase !== 'escalating-task' → return null (fall through). 2. state.phase === 'escalating-task' → return action='stop' with the state's nextAction (the /sf escalate hint state.ts produced). 3. Rule sits at index 0 of DISPATCH_RULES so phase-agnostic rules below (rewrite-docs, UAT, reassess) cannot bypass it. Failure boundary: pure phase check, no fs/db access — nothing to fail. Evidence: typecheck clean. State derivation already smoke-tested in `ea8819906` — once that returns phase='escalating-task', this rule emits the stop. End-to-end happy path is just two function calls. Non-goals: - Tools to write escalation_pending (the producer side — task agents need a tool for this; later fire) - /sf escalate user command (later fire) - Resolution flow (escalation.ts has the schema; resolveEscalation helper from gsd-2 is not yet ported — later fire) Invariants: - Safety: phase !== 'escalating-task' → 1 condition check, return null. Zero overhead in the common case. - Liveness: when paused, dispatch returns immediately — never runs another rule that could mutate slice state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:07:56 +02:00
Mikael Hugo	ea8819906d	feat(sf): wire escalation detection into state derivation (PDD) State derivation now emits phase='escalating-task' when a task in the active slice is paused waiting for a user decision. Builds on the type+DDL foundation in `62dacb627`. Together they get the loop to STOP when there's a pending escalation rather than carrying past an undocumented decision. PDD spec for this change: Purpose: pause auto-mode at the state-derivation layer when any task in the active slice has escalation_pending=1 with an unresolved escalation artifact. The dispatcher (next fire) sees phase= 'escalating-task' and returns 'stop' rather than dispatching new work over a pending decision. Consumer: state.ts deriveStateFromDb() callers — the auto-loop, the /sf status dashboard, the future /sf escalate command. Contract: 1. Empty tasks list → null (no pause). Verified. 2. Task without escalation_pending → null. Verified. 3. escalation_pending=1 but no artifact path → null (treats as not actionable). Verified. 4. escalation_pending=1 + valid artifact + no respondedAt → returns task id; state.phase = 'escalating-task' with task id in blockers and a /sf escalate hint in nextAction. Verified. 5. respondedAt set → null (already resolved, fall through). Verified. Failure boundary: any read/parse failure on the artifact returns null from detectPendingEscalation — state derivation falls through to existing behavior. Strict schema validation in readEscalationArtifact treats malformed artifacts as 'no actionable escalation here.' Evidence: smoke test exercises all 5 contract conditions end-to-end with real filesystem artifacts. Typecheck clean. Existing state derivation paths unchanged when no task is paused (early continue on escalation_pending !== 1 in detectPendingEscalation's loop). Non-goals: - Dispatch rule that returns 'stop' on phase='escalating-task' (next fire — needs no DB changes, just an auto-dispatch.ts edit) - Escalation artifact creation tools (gsd-2 has writeEscalation- Artifact + buildEscalationArtifact + setTaskEscalationPending — those land when a task agent needs to file an escalation) - /sf escalate user command (later fire) Invariants: - Safety: no escalation pending → 0 file system reads (loop early- continues), zero behavior change vs current. - Liveness: if a task IS paused, state.phase becomes 'escalating- task' immediately — no race with dispatch ordering. Assumptions verified: - SF's EscalationArtifact + EscalationOption types match gsd-2's schema (verified earlier this session). - TaskRow has escalation_pending and escalation_artifact_path fields (added in `62dacb627`). - getSliceTasks() returns DB rows that include those fields after the v23 migration ran. - state.ts has the slice-level scope I need (activeMilestone + activeSlice + registry + requirements + progress all visible at the insertion point). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:06:29 +02:00
Mikael Hugo	d3574f3c4d	fix(sf): guard escalation index migration	2026-05-02 20:05:12 +02:00
Mikael Hugo	62dacb6270	feat(sf): foundation for mid-execution escalation (ADR-011 P2) Type-level + DB scaffolding for the escalation feature gsd-2 has but SF lacks. Pure additive — no behavior change yet. Mirrors the same incremental pattern that worked for progressive planning (types + DDL first, state derivation + dispatch + module port in subsequent fires). PDD spec: Purpose: lay the foundation so a task agent can write tasks.escalation_pending=1 + escalation_artifact_path=<file> when it hits a decision the user must make. Future fires will: (1) add detectPendingEscalation() to state.ts, (2) add a dispatch rule that returns 'stop' on phase='escalating-task', (3) port the escalation helper module from gsd-2. Consumer: task agents (execute-task) when they hit ambiguity that shouldn't be silently resolved. Operators running future /sf escalate list/resolve commands. Contract: - types.ts:23 Phase union now includes 'escalating-task'. - sf-db.ts:370-371 fresh CREATE TABLE for tasks gains escalation_pending + escalation_artifact_path. - sf-db.ts:1430+ schema_version 23 migration adds the columns + an opportunistic index for fast pending-escalation lookups. - TaskRow type gains escalation_pending?: number and escalation_artifact_path?: string \| null. rowToTask returns them with safe defaults (0 and null). Failure boundary: index creation is wrapped in try/catch — backends without index support fall through silently. Pre-migration installs treat the column as 0 default (no escalation pending) on first read, matching post-migration default. Evidence: typecheck passes; smoke test deferred to next fire when the state derivation rule lands and we have something observable to test. Non-goals: - state.ts emission of phase='escalating-task' (next fire) - auto-dispatch.ts pause rule (next fire) - escalation.ts helper module port (next fire — 367 LOC in gsd-2) - /sf escalate user command (later fire) - Escalation artifact format/validation (later fire) Invariants: - Safety: ALTER TABLE adds nullable/defaulted columns; existing rows behave identically (escalation_pending defaults to 0). - Liveness: migration runs in same atomic transaction block as other version 23 work — never half-applied. Assumptions verified: - SF already has EscalationOption + EscalationArtifact types (types.ts:692-704) — they were stubs with no producers; this commit is the producer-side scaffolding. - schema_version 22 already exists and is the current latest; 23 is the next available. ADR-011 reference: gsd-2's docs/dev/ADR-011-progressive-planning- escalation.md covers both progressive planning (already ported in this session) and mid-execution escalation (in progress). SF's own ADR-011 file (docs/dev/ADR-011-swarm-chat-and-debate-mode.md) is unrelated to gsd-2's ADR-011 — same number, different topic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 20:00:16 +02:00
Mikael Hugo	99965091d4	fix: inline-fix for high/critical self-feedback entries - sf-mooe4m5k-6fm7z9: Add orphan next-server process reaper to web-mode.ts - reapOrphanedNextServerProcesses() detects and kills orphaned next-server processes with cwd under dist/web/standalone and parent PID 1 - Wired into launchWebMode (before port reservation) and stopWebMode --all - Tests verify export and safe execution on non-Linux platforms - sf-moocr4rv-au7r3l: Add harness promotion path from .sf to tracked docs - handleHarnessPromote() writes reviewable artifacts to docs/exec-plans/active/ - handleHarness now accepts 'promote <finding-id>' subcommand - Promoted artifacts include observed state, review checklist, and notes - sf-moocz9so-4ffov2: Add basic flow auditor via /sf doctor flow - runFlowAudit() inspects auto.lock, runtime units, notifications, child processes - Reports active unit age, warnings, recommendations, child process classification - Wired into handleDoctor as 'flow' subcommand	2026-05-02 19:57:41 +02:00
Mikael Hugo	fead8c1eca	feat(sf): restore /sf debug session feature from gsd-2 (PDD) Reverses commit `1891ccbdc` which deleted commands-debug.ts and debug-session-store.ts as orphan code. They were not orphan — gsd-2 has the full feature wired (commands/handlers/ops.ts:46-49). The 2 prompts that the dispatch references existed in gsd-2 but had never been ported to SF, which is why my deletion looked correct in isolation. PDD spec for this restoration: Purpose: bring back /sf debug — a structured debug-session workflow where the user runs '/sf debug <issue>' to start a session, and SF's auto-mode dispatches debug-session-manager (find_and_fix) or debug-diagnose (find_root_cause_only) prompts to the LLM. Consumer: users at the prompt typing /sf debug. Contract: - /sf debug → usage text - /sf debug <issue> → create session, dispatch find_and_fix - /sf debug list → enumerate sessions - /sf debug status <slug>→ show session details - /sf debug continue <slug> → resume - /sf debug --diagnose <issue\|slug> → diagnose-only path Failure boundary: dispatch failures are caught — the session record is still persisted to .sf/debug/sessions/, the user can retry with /sf debug continue <slug>. Evidence: - typecheck: clean - prompt-load: both debug-diagnose and debug-session-manager render against the var sets the dispatch passes - tests: 37/37 pass under vitest harness (file uses node:test runner, vitest counts 'tests 37 pass 37 fail 0' even though it tags the file 'failed' on reporter mismatch) Non-goals: - Not redesigning the feature, just restoring it - Not adding new dispatch paths, just the user-facing /sf debug Invariants: - Safety: when not invoked, debug-session-store.ts has zero side-effects (lazy file system access only on session create) - Liveness: session creation writes to .sf/debug/sessions/ immediately so a crash mid-flow leaves a recoverable record Assumptions verified: - All 7 files (2 ts + 2 prompts + ops.ts edit + catalog edit + 1 test) port cleanly with gsd→sf identifier rewrites - The customType strings in commands-debug.ts and the test match ('sf-debug-start', 'sf-debug-continue', 'sf-debug-diagnose') What we kept better than gsd-2: still SF (all SF improvements over gsd-2 untouched — gap-audit, judgment-log, plan-quality, etc. all preserved; the deletion this commit reverses was the only regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:49:34 +02:00
Mikael Hugo	0c7c4eca5b	fix(sf): harden auto loops and skill sandbox	2026-05-02 19:46:36 +02:00
Mikael Hugo	d742602454	feat(sf): wire deep planning mode dispatch (PDD) Closes the deep-mode rollout. With this commit, planning_depth: 'deep' in PREFERENCES.md produces a 4-stage project-level discussion BEFORE any milestone work — workflow-preferences → discuss-project → discuss-requirements → research-project (research-decision is auto- resolved to skip-default by SF's resolver, simpler than gsd-2's explicit user-decision gate). PDD spec for this change: Purpose: route auto-mode through project-level setup before milestones when planning_depth='deep'. When absent or 'light', existing dispatch is preserved 1:1. Consumer: auto-mode dispatcher (DISPATCH_RULES). One new rule sits at the top of the pre-planning ladder; existing rules unchanged. Contract: 1. planning_depth absent or 'light' → rule returns null → existing dispatch unchanged. Verified: returns 'not-applicable'. 2. planning_depth='deep' + empty project → dispatches workflow- preferences then progresses through stages as artifacts land. Verified: returns 'pending'/'workflow-preferences'. 3. status='blocked' → returns dispatch action 'stop' with the gate's reason — never silently bypasses a blocker. 4. status='complete' → returns null → milestone-level rules below take over. Failure boundary: if resolveDeepProjectSetupState() throws, return null and fall through to legacy rules. Never blocks the user on a helper crash. Evidence: typecheck passes; gate-resolver smoke test verifies all three contract conditions; existing dispatch tests unchanged (light-mode regression-protected). Non-goals: - In-flight idempotency markers for research-project (gsd-2 has these; SF's resolver auto-completes the stage when files land so the simple guard is sufficient — can add markers later if parallel orchestrator races emerge). - Plumbing structuredQuestionsAvailable through DispatchContext (defaulted to 'false' in builders for now; UI capability detection can be threaded later). Invariants: - Safety: light-mode + absent-prefs paths return null at the FIRST check, before any DB or filesystem access. No regression possible. - Liveness: the resolver enforces forward progress — once a stage's artifact lands, the next gate fires next dispatch cycle. Assumptions verified: - resolveDeepProjectSetupState exists in SF (deep-project-setup-policy.ts). - planning_depth: 'light' \| 'deep' typed in preferences-types.ts:425. - All 4 dispatched unit types have builders in auto-prompts.ts (added in `5e8bdefbe`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:42:41 +02:00
Mikael Hugo	5e8bdefbea	feat(sf): add 5 deep-planning-mode prompt builders (PDD) Companion to `b771dd0b3` (deep-mode prompt templates). Adds the five auto-prompts.ts builders that load those templates with the correct vars. PDD spec for this change: Purpose: complete the load path for deep-mode planning so dispatch rules can call buildDiscussProjectPrompt(), etc., without crashing. Consumer: auto-dispatch.ts deep-mode rules (next commit). Contract: each builder returns a populated prompt string for its unit type given (basePath, structuredQuestionsAvailable). All 5 load successfully against their respective .md templates with no missing-var errors. Failure boundary: loadPrompt throws SF_PARSE_ERROR if a template variable is missing — surfaces a clear error rather than silently rendering a half-substituted prompt. Evidence: typecheck passes; loadPrompt verification in last fire's log shows all 5 prompts render to non-empty strings (2.6k–7.7k chars each). Non-goals: dispatch wiring (separate commit, requires the deep-project-setup-policy resolver SF already has). Invariants: - Safety: existing builders unchanged — no regression. - Liveness: each builder returns within one prompt-load round-trip. Assumptions verified: - inlineTemplate('project'/'requirements') already exists in prompt-loader.ts. - sf_requirement_save and sf_summary_save tools exist in db-tools.ts (referenced by the prompts they load). - phases.planning_depth: 'light' \| 'deep' already typed in preferences-types.ts (line 425). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:36:50 +02:00
Mikael Hugo	b771dd0b31	feat(sf): port 5 deep-planning-mode prompts from gsd-2 Adds the prompt templates that gsd-2 uses for its 'deep' planning_depth mode — a multi-stage discussion flow (project → requirements → research decision → parallel research) that runs BEFORE any milestone-level discussion. SF only had milestone-level discuss flow; this fills the project-level and requirements-level gaps. Ported files: - guided-discuss-project.md — project-wide vision/users/anti-goals - guided-discuss-requirements.md — structured R### requirements interview - guided-research-decision.md — yes/no gate for parallel research - guided-research-project.md — 4-way parallel research orchestrator - guided-workflow-preferences.md — workflow + planning prefs collection gsd→sf adaptations: GSD/gsd → SF/sf, .gsd/ → .sf/, gsd__save tool names → sf__save, GSD Skill Preferences → SF Skill Preferences. All 5 verified to load via loadPrompt with their required template variables. The two sf_* tools they reference (sf_requirement_save and sf_summary_save) already exist in db-tools.ts. This is the first half of the deep-mode port. Remaining work for full end-to-end: - Port 5 builders to auto-prompts.ts (buildDiscussProjectPrompt, etc.) - Port dispatch rules to auto-dispatch.ts (each gates on prefs.planning_depth === 'deep') - Port resolveDeepProjectSetupState helper for the research-decision marker file - Add planning_depth: 'deep' \| 'light' to PhaseSkipPreferences Default behavior preserved: without planning_depth set, current SF 'light' behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:33:19 +02:00
Mikael Hugo	a5c3d75344	feat(sf): sf_plan_slice auto-clears is_sketch when refining a sketch slice Closes the last gap in the ADR-011 progressive planning chain. When refine-slice runs and persists its full plan via sf_plan_slice, the tool now zeros is_sketch atomically with the plan upsert (only when the slice was actually a sketch — idempotent no-op otherwise). This means the dispatch rule from `0c78b0038` will route to refine-slice on the FIRST visit to a sketch slice, then route to plan-slice on any subsequent visit because the flag is gone. No infinite refine loops. sketch_scope is preserved on clear (clearSliceSketch only touches the is_sketch column) so the original scope hint stays as an audit trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:28:50 +02:00
Mikael Hugo	d4be9afe15	feat(sf): producer side of progressive planning — plan-milestone emits sketches, insertSlice persists is_sketch+sketch_scope Closes the producer half of the ADR-011 rollout. With this commit, the end-to-end progressive planning path is complete and runnable: plan-milestone → insertSlice writes is_sketch=1 → dispatch reads it → refine-slice expands → clearSliceSketch zeros the flag. Changes: sf-db.ts insertSlice: extends the typed payload with isSketch and sketchScope (3-valued: true/false/undefined). The INSERT INTO and ON CONFLICT clauses gain is_sketch + sketch_scope columns with the same NULL-sentinel pattern (raw_is_sketch / raw_sketch_scope) used by every other field — so a re-plan that omits these flags preserves any existing sketch state rather than blanking it. sf-db.ts clearSliceSketch: new exported helper for refine-slice to call after persisting the full plan. Idempotent. tools/plan-milestone.ts validateSlices: handles 3-valued isSketch semantics. When isSketch=true, sketchScope is required (non-empty) and the heavyweight planning fields (successCriteria, proofLevel, integrationClosure, observabilityImpact) are optional. Non-sketches keep current strict validation (no regression for existing callers). tools/plan-milestone.ts persist loop: passes isSketch/sketchScope through to insertSlice; skips upsertSlicePlanning entirely when isSketch=true (the planning fields belong to refine-slice's output). End-to-end DB test verified all four behaviors: ✅ isSketch=true + sketchScope writes is_sketch=1 + scope text ✅ Explicit isSketch=false writes is_sketch=0 ✅ Omitted isSketch defaults to 0 on insert ✅ clearSliceSketch zeros the flag while preserving sketch_scope ✅ ON CONFLICT with omitted isSketch preserves existing row state Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:26:08 +02:00
Mikael Hugo	c11595cf22	feat(sf): DB migration v22 adds is_sketch + sketch_scope columns (ADR-011) Mirrors gsd-2's slices schema for progressive planning. Three changes to sf-db.ts: 1. Fresh-install CREATE TABLE for slices (line 312) gains: - is_sketch INTEGER NOT NULL DEFAULT 0 -- 1 = awaiting refine - sketch_scope TEXT NOT NULL DEFAULT '' -- 2-3 sentence scope hint 2. Schema version 22 migration: ensureColumn for both fields so existing installs upgrade without data loss. Wrapped in the same currentVersion < N guard pattern as v6, v7, v8 ... v21. 3. rowToSlice() returns sketch_scope and is_sketch on the SliceRow so the dispatch rule from `0c78b0038` can read them via getSlice(). End-to-end verified: fresh DB has both columns at defaults; getSlice() returns is_sketch=0, sketch_scope='' on a freshly-inserted slice. Closes the DDL-migration gap from the progressive-planning rollout plan in `fef2e4b6f`. Remaining: plan-milestone tool needs to write is_sketch=1 + sketch_scope when emitting sketches; refine-slice tool needs to clear is_sketch=0 when persisting the full plan. Until those land, the dispatch rule still falls through (sketches never created). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:18:50 +02:00
Mikael Hugo	0c78b00381	feat(sf): wire ADR-011 progressive planning dispatch rule Adds 'planning (sketch + progressive_planning) → refine-slice' rule in auto-dispatch.ts, fired BEFORE the existing 'planning → plan-slice' rule. Activates when: - state.phase === 'planning' - prefs?.phases?.progressive_planning === true - slice has is_sketch=1 in the DB When all three conditions hold, dispatches the refine-slice unit using the existing buildRefineSlicePrompt + prompts/refine-slice.md (both ported in earlier commits). Otherwise falls through to plan-slice (graceful downgrade — current behavior is preserved when the flag is off, which is the default). Why this matters: without progressive planning, the milestone planner has to either fully-plan every slice upfront (rots quickly) or hand- wave each slice (executors overscope). Sketch+refine lets the planner write 2-3 sentences of scope per slice and have refine-slice expand it just-in-time using prior slice summaries as context — keeping each plan sized for the actual current reality. Defensive read of slice.is_sketch with try/catch: pre-migration installs without the column simply fall through to plan-slice, no error. The DB DDL migration will land separately as part of the full progressive- planning rollout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:14:21 +02:00
Mikael Hugo	fef2e4b6f4	feat(sf): add type-level scaffolding for progressive planning (ADR-011) Three additive type changes that prepare SF to wire refine-slice through the state machine. Pure type-level — no runtime behavior change yet: 1. types.ts:14 — Phase union gains "refining" between "planning" and "evaluating-gates". State derivation will yield this when a slice has is_sketch=1 AND phases.progressive_planning=true. 2. types.ts:354 — PhaseSkipPreferences.progressive_planning?: boolean. Off by default; turning it on enables sketch→refine flow. 3. sf-db.ts:2321 — SliceRow.is_sketch?: number. Column DDL not yet added; this just lets the type compile when migration lands. This is the smallest forward step toward closing the refine-slice gap identified by sf-moojsmkg-72k3ei. Next steps (separate PRs): - DB migration: ALTER TABLE slices ADD COLUMN is_sketch INTEGER NOT NULL DEFAULT 0 (mirroring gsd-2 sf-db.ts:381,1074) - state.ts: derivation rule emit phase="refining" when sketch+flag - auto-dispatch.ts: "refining → refine-slice" rule + import buildRefineSlicePrompt - Tests: progressive-planning.test.ts equivalent Existing buildRefineSlicePrompt + prompts/refine-slice.md already in place — only the FSM path is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:10:03 +02:00
Mikael Hugo	be4257b411	feat(sf): port refine-slice prompt from gsd-2 src/resources/extensions/sf/auto-prompts.ts:2143 buildRefineSlicePrompt() already existed, calling loadPrompt("refine-slice", ...) — but the template file was missing, so the function would throw if ever called. gsd-2 has the prompt; ported with /gsd → /sf, .gsd/ → .sf/, GSD → SF, gsd_plan_slice → sf_plan_slice, gsd_self_report → sf_self_report, gsd/templates → sf/templates substitutions. Verified end-to-end: loadPrompt("refine-slice", { ...vars }) succeeds and produces a 5906-char rendered prompt with all 12 template variables satisfied by renderSlicePrompt's existing var-passing. This is a partial fix for sf-moojsmkg-72k3ei — the prompt now loads, but full feature wire-up still requires: - new state.phase value "refining" - new preference phases.progressive_planning (gsd-2 only enables refine when this pref is true) - dispatch rule "refining → refine-slice" in auto-dispatch.ts - slice DB schema's sketch_scope already exists in the function body but downstream FSM transitions need wiring Without those, buildRefineSlicePrompt is loadable but uncalled. Decision needed: port the full FSM path or remove the unused builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:03:56 +02:00
Mikael Hugo	c3ab4bfccf	feat(sf): port 16 workflow templates from gsd-2 Adds 16 ready-to-use workflow templates that gsd-2 has but SF was missing. Each runs via /sf workflow run <name> or /sf start <name>. Markdown phased workflows (10): - accessibility-audit — UI a11y scan + remediation report - api-breaking-change — survey callers, migrate, deprecate, schedule removal - changelog-gen — release notes from git log since last tag - ci-bootstrap — minimal-working CI pipeline - dead-code — find unused functions/files (report only, no delete) - issue-triage — classify a GitHub issue + label/priority recommendation - observability-setup — structured logs, metrics, tracing - onboarding-check — walk README as new contributor, report gaps - performance-audit — measure → fix → measure - pr-review — structured code review of a PR - pr-triage — bucket open PRs (merge/close/nudge) - release — version bump → changelog → tag → publish (gated) YAML-step iterators (4): - docs-sync — backfill JSDoc/TSDoc on undocumented exports - env-audit — inventory env vars + flag drift - rename-symbol — global rename across code/tests/docs - test-backfill — write unit tests for untested functions All gsd-specific refs adapted: /gsd → /sf, .gsd/ → .sf/, gsd-build/gsd-2 → singularity-forge/sf-run. Templates need no SF-runtime tools (sf_, subagent, browser_) — they run via the bash + git + gh/npm commands the agent already has. Discovery verified: discoverPlugins() picks up all 27 templates (11 existing + 16 new); registry.json is 1:1 with the .md/.yaml files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 19:01:51 +02:00
Mikael Hugo	955ee66614	fix(sf): replace 'Lex' personal name with generic 'project owner' in milestone-validation template templates/milestone-validation.md:60 was instructing the validating agent to add 'enough context for Lex to make a decision'. Lex is the developer's personal nickname; bundled templates ship to every SF user and other users would write validation reports referencing a stranger. Now reads 'enough context for the project owner to make a decision' — generic and accurate for any project. Tree-wide grep for Lex/Mikael/Mikki across bundled resources now returns zero personal-name references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 18:49:20 +02:00

... 11 12 13 14 15 ...

4540 commits