singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	5a57549591	feat(headless): autonomous mode auto-drains self-feedback triage queue first Before this change, `sf headless autonomous` only dispatched units for the active milestone — never touched .sf/self-feedback.jsonl. The existing `sf headless triage --apply` was a manual operator path required for self-feedback to become actionable work. Defeats the "SF self-heals" thesis: 146 entries can sit in the queue indefinitely while the autonomous loop happily cranks on M005. Now: at autonomous startup (not on resume, not on initial bootstrap) SF calls handleTriage({ apply: true, max: 5 }) to drain the top-5 candidates from the triage queue before entering the dispatch loop. The bound at max=5 keeps the upfront cost bounded; remaining items process on the next session_start. The comment on the existing triage handler in headless.ts:917-921 explicitly acknowledged the gap — autonomous-loop followUp delivery was broken (sf-mp4rxkwb-l4baga). Wiring the deterministic triage path BEFORE the dispatch loop closes that gap. Opt-out: pass --skip-triage on the autonomous command (e.g. when debugging a specific milestone without backlog churn). Triage failures are non-fatal — they log a warning and the autonomous loop continues with its existing milestone dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:01:50 +02:00
Mikael Hugo	8d0f41436b	feat(benchmark-selector): phase 2 — quota-aware routing weight Bias dispatch toward under-used subscriptions ("spend the subs") and de-prioritize near-exhausted ones (avoid 429 walls). Multiplier is applied to the benchmark score before sort, so it only re-orders within the existing score → cost → coverage → preference ladder. Unknown quota state stays neutral 1.0 — never punish a provider for having no public quota API. Curve, keyed on max(usedFraction) across all windows: < 0.20 → 1.15 (boost — lots of headroom, prefer to use it) < 0.50 → 1.00 (neutral) < 0.70 → 0.92 (slight steer away) < 0.90 → 0.50 (strong de-prioritize) < 0.95 → 0.20 (near-exhaustion) ≥ 0.95 → 0.05 (effectively skip) Max-across-windows means kimi-coding's 5h-rolling window (tighter) binds the decision even when the weekly is fresh. New exported helper quotaHeadroomMultiplier(providerKey, getQuotaState?) takes the resolver as optional dep for testability; defaults to getProviderQuotaState from provider-quota-cache.js. 16 new tests cover the curve and the selectByBenchmarks integration (unknown quota → unchanged, demoted high-usage provider, boosted under-used provider, near-exhausted skipped when alternatives exist). Filed as SF backlog item sf-mpmp8ie6xf-z4cxhg before — now closes that loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 18:09:33 +02:00
Mikael Hugo	b39cf3387e	fix(quota/zai): use raw key auth (no Bearer) + correct response shape Cross-referenced vbgate/opencode-mystatus reference implementation and found two real bugs in the zai fetcher: 1. Auth header: zai's monitor endpoint expects `Authorization: <key>` with NO `Bearer ` prefix. Using Bearer caused the server to treat the call as unauthenticated and return the generic "no coding plan" response even for active coding-plan users. 2. Response shape: real envelope is { code, msg, success, data: { limits: [ { type: "TOKENS_LIMIT"\|"TIME_LIMIT", usage, currentValue, percentage, nextResetTime? } ] } } Was looking for `data: [...]` directly and using `limit`/`used` fields. Now parses `data.data.limits[].usage` / `.currentValue`. 3. Added User-Agent header to match the reference tool. Live probe finding: this user's z.ai key works fine for inference (/api/coding/paas/v4/models returns 200 with the full model list) but the monitor endpoint reports "no coding plan" — meaning their account uses the regular pay-as-you-go z.ai/zhipu tier, not the separately-billed "Coding Plan" subscription that the monitor endpoint serves. The 429s they observe during inference are rate-limit RPM/TPM errors, not coding-plan window exhaustion. Code change is correct; the error message is now accurate and actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 18:04:12 +02:00
Mikael Hugo	8fa9a4b8fa	fix(quota): match real API shapes for kimi-coding / minimax / zai Dogfooded `sf headless usage` against live APIs and discovered three shape mismatches in the phase-1 fetchers: - kimi-coding returns numeric fields as STRINGS ("limit": "100") and uses camelCase `resetTime`. Added toNum() coercion + reset hint extraction. Now reports Weekly + 5h rolling windows correctly. - minimax response is `{ model_remains: [{ model_name, current_interval_total_count, current_interval_usage_count, current_weekly_total_count, current_weekly_usage_count, end_time, weekly_end_time, ...}] }` — per-model rolling + weekly windows, not the flat `remaining_tokens`/`total_tokens` shape I had assumed. Rewrote parser to emit one window per model entry. - zai uses a `{ code, msg, success, data }` envelope. When `success: false` (e.g. user lacks an active coding plan), parser now surfaces vendor msg as the entry error instead of silently emitting no windows. Tests updated to mirror real shapes; added one for zai's failure envelope. 12 tests pass (was 11). Live result from re-running `sf headless usage`: - openrouter: 80.7% used, $7.71 remaining (real signal — watch this) - kimi-coding: Weekly 32%, 5h 4% - minimax: MiniMax-M* 5h 1.4% + coding-plan-vlm/search 1.4% - gemini-cli: 0.0-0.4% across all models (clean) - zai: surfaces "user does not have a coding plan" — may need a different endpoint or scope depending on the user's account setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:59:53 +02:00
Mikael Hugo	c0d089f9ca	feat(catalog/quota): global model catalog, benchmark coverage audit, provider quota visibility Phase-1 work shipped together since prior auto-snapshots split it across several commits. This commit captures the leftover type declarations, the new provider-quota-cache test suite, and the last register-hooks / cli wiring. Highlights now in tree: - Model catalog moved from per-project to global `~/.sf/model-catalog/` via `sfHome()` (one cache shared by all repos; no more 9-dir duplication). - `benchmark-coverage.js` audits the dispatchable model set against `learning/data/model-benchmarks.json` at session_start, writes `~/.sf/benchmark-coverage.json`, notifies on change. - `provider-quota-cache.js` introduces phase-1 subscription quota visibility for the 5 providers with documented APIs: kimi-coding (/coding/v1/usages), openrouter (/api/v1/credits), minimax (/v1/token_plan/remains), zai (/api/monitor/usage/quota/limit), google-gemini-cli (existing snapshotGeminiCliAccount). 15-min TTL, global cache. - `sf --maintain` CLI flag refreshes catalogs + quotas + coverage audit in one idempotent pass. Daemon spawns it every 6h. - `sf headless usage` rewritten to display all providers from the unified cache, with explicit "no public API" notes for mistral, ollama-cloud, opencode, opencode-go, xiaomi. - Awaitable `runXIfStale` variants for model-catalog, gemini-catalog, openai-codex-catalog (the schedule* variants now wrap them in setImmediate). - TypeScript declarations added for the new JS modules so the dist-redirect pipeline type-checks cleanly. Phase 2 (quota-aware routing in benchmark-selector) is filed as SF self-feedback for the backlog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:37:20 +02:00
Mikael Hugo	effbb75f83	sf snapshot: uncommitted changes after 30m inactivity	2026-05-16 17:30:35 +02:00
Mikael Hugo	b5764af27b	sf snapshot: uncommitted changes after 33m inactivity	2026-05-16 17:00:13 +02:00
Mikael Hugo	b73e386090	sf snapshot: uncommitted changes after 56m inactivity	2026-05-16 16:26:40 +02:00
Mikael Hugo	4442400d11	sf snapshot: uncommitted changes after 30m inactivity	2026-05-16 15:29:43 +02:00
Mikael Hugo	da0c41d375	sf snapshot: uncommitted changes after 56m inactivity	2026-05-16 14:59:40 +02:00
Mikael Hugo	6071a9207c	sf snapshot: uncommitted changes after 53m inactivity	2026-05-16 14:03:34 +02:00
Mikael Hugo	af2f86c3c2	sf snapshot: uncommitted changes after 827m inactivity	2026-05-16 13:10:15 +02:00
Mikael Hugo	6f32f9287a	sf snapshot: pre-dispatch, uncommitted changes after 30m inactivity	2026-05-15 23:22:58 +02:00
Mikael Hugo	e2c3d6542c	sf snapshot: uncommitted changes after 97m inactivity	2026-05-15 22:52:58 +02:00
Mikael Hugo	a2a6ab767c	fix(auto): replan inactive milestones with DB context	2026-05-15 21:15:04 +02:00
Mikael Hugo	ecf6af92e8	fix(auto): avoid resuming blocked no-unit sessions	2026-05-15 20:56:15 +02:00
Mikael Hugo	03f6d4990f	fix(auto): set solver iteration default to 25	2026-05-15 20:35:10 +02:00
Mikael Hugo	8e85a6e673	fix(state): skip cancelled slices during dispatch	2026-05-15 20:24:11 +02:00
Mikael Hugo	62d63f111e	fix(auto): bound solver iteration defaults	2026-05-15 20:14:30 +02:00
Mikael Hugo	0b187b9f62	fix(headless): remove legacy v1 fallback path	2026-05-15 20:12:00 +02:00
Mikael Hugo	e2e096c5c7	feat(rpc): configurable RPC init timeout via SF_RPC_INIT_TIMEOUT_MS Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init(). Default init timeout increased from 30s to 120s. Override via env var. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 20:00:26 +02:00
Mikael Hugo	ced90e84a8	test(headless): update v2 migration tests for fatal-by-default fallback policy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 20:00:02 +02:00
Mikael Hugo	5b831e587b	feat(headless): gate v1 string-matching fallback behind env var Require SF_HEADLESS_ALLOW_V1_FALLBACK=1 to use legacy v1 fallback. Default behavior now exits with error when v2 init fails, preventing silent degradation to less reliable protocol matching. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 19:59:03 +02:00
Mikael Hugo	15ae3d02b7	fix(sf-db): write snapshots atomically	2026-05-15 19:49:04 +02:00
Mikael Hugo	a8a28bd7c0	docs(specs): add sf-prompt-modularization.md operator guide Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 19:47:20 +02:00
Mikael Hugo	92ff8186ba	feat(prompts): add v2 migration regression tests + fix template variable drift - Migrate all remaining v1 builders (research-milestone, complete-slice, run-uat, reassess-roadmap, deploy, smoke-production, release, rollback, challenge) from composeInlinedContext to composeUnitContext v2. - Remove unused composeInlinedContext import from auto-prompts.js. - Add 7 regression tests in auto-prompts-v2-migration.test.mjs covering all migrated builders. - Fix template variable drift: deploy.md expected {{releaseVersion}} and release.md expected {{newVersion}} — neither builder provided them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 19:46:13 +02:00
Mikael Hugo	bd27f61da7	feat(prompts): migrate remaining builders to composeUnitContext	2026-05-15 19:34:53 +02:00
Mikael Hugo	3e1e177466	fix(uok): auto-repair runtime projection mismatch	2026-05-15 19:34:42 +02:00
Mikael Hugo	c02c71f216	chore(gitignore): ignore sf session todo state	2026-05-15 19:25:14 +02:00
Mikael Hugo	ce169ddc22	test(manifest): cover computed context declarations	2026-05-15 19:10:17 +02:00
Mikael Hugo	30f1cca984	feat(manifest): add knowledge/graph computed artifacts to remaining unit types M004 S01: Update manifests to support knowledge and graph artifacts. Adds computed: ["knowledge", "graph"] to manifests that did not yet declare them, matching the actual behavior of their prompt builders: - execute-task, reactive-execute - discuss-project, discuss-requirements, research-project - workflow-preferences (knowledge only — no graph scope) These unit types already inline knowledge/graph via their builder functions in auto-prompts.js; the manifest declarations were missing. This brings the manifest schema into sync with real dispatch behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 19:09:47 +02:00
Mikael Hugo	c718bd605b	chore(gitignore): ignore sf active runtime projections	2026-05-15 19:07:58 +02:00
Mikael Hugo	c6b8815ad9	feat(verification): wire auto-defer into finalize phase M003 S05 continuation: phases-finalize.js now handles "continue" from runPostUnitVerification as an auto-defer path (low-risk findings). - phases-finalize.js: added `verificationResult === "continue"` branch after pause/retry checks — logs "verification-deferred" and falls through to post-verification processing instead of breaking/retrying - uok/auto-verification.js: defer decision runs before retry logic, returns "continue" without consuming retry attempts - verification-evidence.js: forwards deferred fields in evidence JSON Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:59:07 +02:00
Mikael Hugo	f48a4cc7c5	feat(verification): auto-defer confidence policy for low-risk findings Implements M003 S05: auto-deferral policy for low-risk validation findings. - New verification-defer-policy.js: classifyCheck, computeDeferConfidence, decideAutoDefer — classifies failed checks as deferrable/blocking/unknown - Patterns: style/format/deprecation-only → deferrable; error/fail/crash/fatal → blocking (always wins) - Confidence scoring: 0.9 all-deferrable, 0.7 mixed, 0.5 unknown, 0.0 blocking - Threshold preference: verification_auto_defer_threshold (default 0.75) - Integration in uok/auto-verification.js: checks defer before retry/pause, does not consume retry attempts, writes deferred: true + reasons to evidence JSON - verification-evidence.js: forwards deferred/deferredReasons/deferConfidence fields - Preferences wired: validation, types, serializer - Tests: 6 unit tests for classification, confidence, threshold, blocking dominance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:55:26 +02:00
Mikael Hugo	1b3dba6e51	test(uok): align purpose-coherence fixtures with merged P2/P3 schema After cherry-picking P2 (v72: slices.traces_vision_fragment) and P3 (v70: tasks.purpose_trace) onto main, the schema migration ladder now adds those columns automatically on every openDatabase. The P4 test fixtures, which were authored when those migrations were still in their own worktree branches, manually ALTER'd the columns — which throws "duplicate column name" post-merge. Two changes, both purely about exercising the same gate paths under the new ground truth: - makeForwardDb no longer manually ALTERs — the migration ladder already provides the columns. The "trace value NULL" branch is exercised by inserting rows with explicit NULL instead of relying on the column being absent. - The "legacy DB" test no longer expects the warning to mention the column name (the column always exists post-migration). The underlying SqliteError catch in evaluatePurposeCoherence remains for the genuinely-legacy DB case where someone is running against a fixture that predates the migration; the test now exercises the NULL-value warn path which is the real-world signal operators see. All 17 uok-purpose-coherence tests pass; full 5-pillar sweep (P1+P2+P3+P4+P5 + migration) 53/53 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:54:59 +02:00
Mikael Hugo	008f0c685d	fix(schema): reorder ADR-0000 migrations so v70/v71/v72 run in order After cherry-picking the swarm commits the migration file had v72 declared before v70/v71 — when applied to a v69 DB the loop ran v72 first, set appliedVersion=72, and the v70/v71 guards `if (appliedVersion < 70)` then `< 71` short-circuited so neither ALTER ran on legacy DBs. Reordered so the file flows v70 → v71 → v72 matching version numbers; idempotent column probes on fresh DBs still pass. Verified: full sf-db-migration suite 13/13 green, including the v52-and-v27 legacy-fixture paths that exercise the migration ladder end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:53:04 +02:00
Mikael Hugo	725affd126	feat(self-feedback): purpose_anchor on entries (ADR-0000 restoration, v71) SF is a purpose-to-software compiler — every self_feedback row must name the milestone vision or slice goal it's filed against, so triage can prioritize against purpose rather than treating each row as floating. - Schema v71 ALTERs self_feedback ADD COLUMN purpose_anchor TEXT. NULL allowed for legacy rows; fresh-DB CREATE includes the column. - sf-db-self-feedback.js: insertSelfFeedbackEntry accepts purposeAnchor (camelCase), stored as :purpose_anchor; listSelfFeedbackEntries({purpose}) pushes a LIKE %fragment% filter into the DB layer so triage doesn't have to pull the full table. - rowToSelfFeedback exposes purposeAnchor, falling back to the JSON projection for legacy rows where the column is NULL. - headless-feedback CLI: `feedback add --purpose <fragment>` persists the anchor; `feedback list --purpose <fragment>` filters by it. Omission stays valid — restoration is additive, not breaking. - help-text + migration test updated; new vitest covers add/list round-trip, NULL-on-omit legacy compat, substring match, and the help-text documentation contract. Restores the doctrine in docs/adr/0000-purpose-to-software-compiler.md: "non-trivial artifacts must name their purpose and consumer." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:51:52 +02:00
Mikael Hugo	3c416162b2	feat(sf-db): record per-task purpose_trace at complete_task (ADR-0000) Restore the purpose-to-software doctrine at the slice gate: every task the executor closes must name the slice-goal sentence or clause it served. complete-slice now refuses to flip a slice to complete while any of its tasks has a NULL purpose_trace, making "did all tasks actually serve the slice goal" a mechanical check instead of a vibe. Schema migration v70 adds a nullable purpose_trace TEXT to tasks (legacy rows stay valid). complete_task refuses without it and quotes slice.goal in the error so the agent can anchor. insertTask / updateTaskStatus accept the new field, rowToTask exposes it, and a new updateTaskPurposeTrace helper covers later corrections. Restoration of doctrine — see docs/adr/0000-purpose-to-software-compiler.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:50:27 +02:00
Mikael Hugo	fa657f2523	feat(plan-milestone): per-slice vision trace (schema v69, ADR-0000 P2) Restoration of doctrine: plan-milestone now emits a literal milestone.vision clause per slice (traces_vision_fragment) so validate-milestone has structured grounds for assessment instead of re-reading the vision through the LLM every time. Schema v69 adds the column (NULL allowed for legacy rows); the prompt and plan_milestone tool start requiring it for new slices, rejecting fragments that do not appear verbatim in milestone.vision. See docs/adr/0000-purpose-to-software-compiler.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:48:50 +02:00
Mikael Hugo	3b83f0898b	merge(P4): purpose-coherence-gate before every dispatch (ADR-0000)	2026-05-15 18:45:17 +02:00
Mikael Hugo	5e2c7a7166	merge(P1): vision quality gate on sf new-milestone (ADR-0000)	2026-05-15 18:45:08 +02:00
Mikael Hugo	59fbaf4b0f	test(doctor): refactor purpose-gate test to use insertMilestone/insertSlice helpers Cleans up doctor-purpose-gate.test.mjs: - Uses insertMilestone/insertSlice helpers instead of raw SQL - Removes redundant test from doctor-plan-dir-normalization.test.mjs - Adds module-level JSDoc purpose comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:44:09 +02:00
Mikael Hugo	a303b5db29	feat(uok): add purpose-coherence-gate pre-dispatch gate (ADR-0000) Restores the eight-PDD purpose gate at the autonomous-loop boundary required by ADR-0000 (SF is a purpose-to-software compiler). The gate walks milestone vision -> slice.traces_vision_fragment -> task.purpose_trace before every dispatch and refuses to proceed when the purpose chain is broken at the vision root (degraded-vision). - New uok/purpose-coherence.js with a pure verdict function and a DB-backed adapter. Reads vision/trace columns directly via SQL so pre-P2/P3 schema migrations are tolerated. - Wired into auto/phases-pre-dispatch.js alongside resource-version- guard, pre-dispatch-health-gate, and planning-flow-gate. Fires on every pre-dispatch turn and emits to the existing trace JSONL. - Outcome ladder: fail (vision missing -> pause loop), warn (trace columns missing or NULL -> surface but allow dispatch so legacy DBs don't hard-break on day one), pass (full chain). - Tests in tests/uok-purpose-coherence.test.mjs cover the four contracted states plus the column-missing downgrade path on a pre-migration schema. Refs: ADR-0000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:42:55 +02:00
Mikael Hugo	fb68b12902	feat(doctor): enforce ADR-0000 purpose gate — milestones need vision, slices need goal Adds two new doctor checks to checkEngineHealth(): - db_milestone_missing_vision: error when a milestone has no vision (the WHY/purpose field per ADR-0000) - db_slice_missing_goal: error when a slice has no goal (the WHAT/purpose field per ADR-0000) Both checks are non-fixable (the operator must define purpose). This aligns with ADR-0000 §Enforcement: "Non-trivial milestones, slices, tasks, ADRs, specs, tests, and exported symbols must name their purpose and consumer." Tests: 2 cases — milestone without vision flagged, slice without goal flagged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:42:14 +02:00
Mikael Hugo	aa0d57371e	feat(headless): enforce ADR-0000 PDD-fields gate at new-milestone Restoration of forgotten doctrine: ADR-0000 declares the eight PDD fields (Purpose, Consumer, Contract, Failure boundary, Evidence, Non-goals, Invariants, Assumptions) the purpose gate, but `sf headless new-milestone --context <file>` was accepting any context including empty or trivially-thin seed docs. This wires a pre-create check that refuses the run when fields are missing or too thin, naming exactly which ones so the operator can fix the seed doc and retry. - new src/resources/extensions/sf/headless-pdd-check.js: scans context for the eight fields (heading and inline-label forms) and reports missing/sparse, plus a minimum-spine check (Purpose + Consumer + Contract + Evidence-or-Falsifier). - src/headless.ts calls the check after loadContext, before bootstrapping .sf/. Refusal exits 1 with formatPddRefusal text. - --skip-pdd-check is the migration escape hatch (warning printed, PDD gate bypassed) for milestones that pre-date the gate. - SF-internal auto-bootstrap (autonomous→new-milestone fallback) is exempted because the seed is SF-generated, not operator-PDD. - vitest test covers missing-Purpose, missing-Consumer, all-8, sparse, inline-label form, Falsifier-as-Evidence spine, and the doctrine field order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:37:06 +02:00
Mikael Hugo	af1401e4ea	fix(solver): enforce PDD purpose gate	2026-05-15 18:32:59 +02:00
Mikael Hugo	bb0c87fdac	feat(remediation-dispatcher): M003 S04 — autonomous recovery from validation findings Implements RemediationDispatcher that classifies verification failures and maps them to recovery strategies: - transient → retry (timeout, flaky test, network) - structural → replan (broken import, syntax error) - knowledge → research (not implemented, missing context) - infra → escalate via self-feedback (tooling broken) Confidence scoring: - Single failing check + known pattern = high confidence - Multiple failures or high retry count = lower confidence - Configurable autoFixThreshold (default 0.6) 15 unit tests covering all 4 failure classes + confidence scoring + threshold behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:29:45 +02:00
Mikael Hugo	a863672463	fix(state): honor completed owning requirements	2026-05-15 18:24:16 +02:00
Mikael Hugo	b08cb13c20	fix(state): requirements-complete short-circuits the planning ladder Symptom: dr-repo M003 had all 8 owning requirements (UNI-01..05, PIL-01..03) marked Status: complete in .sf/REQUIREMENTS.md, but the milestone row was still active because its only slice was a post-migration skipped placeholder. After the previous fix routed all-skipped milestones to pre-planning, SF ran roadmap-meeting + plan-milestone and wrote 3 new slices on a milestone whose contract-level work was already done — burned ~4 LLM turns on plausibly-adjacent but unwanted re-decomposition. Root cause: deriveStateFromDb's milestone-completion gate consults only slice statuses (and indirectly the milestone row's own status field). It never reads REQUIREMENTS.md to check whether the contract is already satisfied. The slice-based view collapsed the real signal. Fix: - New parseRequirementsByMilestone(content) helper in files.js: parses REQUIREMENTS.md, groups entries by their `Primary owning milestone` field, returns Map<id, {complete, incomplete}>. - handleAllSlicesDone now reads REQUIREMENTS.md before its slice-based real-work check. If a milestone has at least one owning requirement and zero of them are incomplete, route to completing-milestone with nextAction naming the requirement count (so the operator can see why the milestone is being closed without manually opening REQUIREMENTS.md). - Best-effort: REQUIREMENTS.md parse failure falls through to the existing slice-based rule. Missing file likewise — no regression for projects that don't keep a requirements file. Resolves sf-mp74hftw-zud6ba filed via the headless feedback CLI. End-to-end verified by re-running sf headless query on dr-repo M003: now reports phase=completing-milestone with the right requirement-count message. Tests: 5 new cases — all complete + slice skipped → completing, some active → pre-planning, zero owning requirements falls through, missing file falls through, all complete + real slice work still completes. Existing 4 all-skipped-replan cases still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 18:22:28 +02:00
Mikael Hugo	6a2c61d5ee	feat(halt-self-feedback): M003 S03 — HaltWatchdog self-feedback integration T01: Added integration test auto-halt-self-feedback.test.mjs that proves: - HaltWatchdog.check() creates a self-feedback DB entry with kind=runaway-loop:idle-halt, severity=high, blocking=true - Markdown projection (.sf/SELF-FEEDBACK.md) is regenerated - Deduplication works (one entry per idle period) - New heartbeat resets and creates a new entry for the next idle period T02: Enhanced evidence string to include elapsedMs, iteration, and thresholdMs explicitly (R003 actionable context requirement). Tests: 36/36 pass across auto-halt-self-feedback, auto-halt-watchdog-notify, and self-feedback-db suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 18:19:39 +02:00

1 2 3 4 5 ...

4601 commits