singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	79db5704bc	fix(self-feedback): require structured evidence kind for trusted resolution Dogfood of the triage worker revealed that the agent can bypass the resolve_issue tool (which hardcodes kind=agent-fix) and write DB rows directly with non-canonical evidence shapes (null, or {file, line}). The earlier credibility check trusted any resolution that had a prose resolvedReason — a "legacy narrative" carve-out meant to preserve operator clears predating structured evidence. Brand-new sloppy agent resolutions slipped through that carve-out: 5/5 of today's triage resolutions had non-canonical evidence and would have been treated as authoritative under the old check. Replace the denylist/legacy-carve-out with an allowlist: - isSuspectlyResolved returns true unless resolvedEvidence.kind is in {agent-fix, human-clear, promoted-to-requirement}. - SUSPECT_RESOLUTION_KINDS is kept as documentation of the auto-version-bump case but the allowlist makes it redundant for the actual policy decision. Tests now cover both failure modes: prose-only resolution (no kind) and non-canonical evidence shape ({file, line}) both re-include the entry as a candidate. Legacy entries that genuinely lack an evidence kind are backfilled to kind=human-clear separately so they keep their resolution under the stricter check. A self-feedback entry (sf-mp4qoby4-meiir7, severity=high) was filed about the underlying bypass — markResolved should ALSO reject or auto-tag non-canonical writes at the writer layer, since the reader is currently the only gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 02:17:47 +02:00
Mikael Hugo	6e95c3542c	fix(bootstrap): always dispatch self-feedback triage on session_start The session_start hook only invoked dispatchSelfFeedbackInlineFixIfNeeded when triage.stillBlocked contained at least one high/critical entry. After the previous commit rewired the worker as a triage queue that returns every open forge-local entry (not just high/critical), this gate stranded medium/low backlog forever at startup — the unit was never given a chance to triage them. The dispatcher's own selectInlineFixCandidates is now the source of truth for eligibility; the call site should call unconditionally. Keep the high/critical-specific notify (still useful operator signal when the loud ones are present) but stop using it to gate the dispatch. The turn_end hook at the bottom of register-hooks.js already calls the dispatcher unconditionally, so this change aligns the two paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:59:13 +02:00
Mikael Hugo	ce58d32231	fix(self-feedback,state): close two state-drift gaps 1. Self-feedback JSONL is now a real append-only audit log. Previously markResolved updated the DB row in place but never echoed the resolution to JSONL, so a DB rebuild via importLegacyJsonlToDb would re-import all entries with their original pre-resolution state and silently lose every resolution that had ever landed. The JSONL was a half event log — creations yes, resolutions no. - Introduce a `recordType: "resolution"` JSONL record shape. Append one of these to the project JSONL whenever markResolved succeeds against the DB. Best-effort: failure to append never blocks the resolution itself. - Extend importLegacyJsonlToDb to handle both record types. Entry creations go through insertSelfFeedbackEntry (ON CONFLICT DO NOTHING — idempotent). Resolution events go through resolveSelfFeedbackEntry, which is already a no-op on missing or already-resolved rows, so replay is idempotent. - Tests cover: the appended record shape; a DB rebuild correctly reconstructing resolved_at/resolved_evidence_json from a JSONL audit trail; orphan resolution events (entry never existed) are a silent no-op. Closes self-feedback entry sf-mp4ikbta-2zcbhh. 2. The reconcile path at state-db.js:reconcileSliceTasks warns when an on-disk SUMMARY.md exists for a task whose DB row is still pending and refuses to silently import — a safety check so autonomous runs can't promote themselves to complete by writing a SUMMARY without a real DB transition. But operators had no remediation path when the drift was real (lost DB write, hand edit). They had to mutate the DB by hand. - New `state-reconcile.js` with `reconcileTaskFromSummary` exposes the remediation explicitly. Parses the SUMMARY via the existing parseSummary helper, validates via isValidTaskSummary, and writes status / completed_at / verification_result / blocker / key_files / full_summary_md into the DB row through a new `setTaskSummaryFields` helper in sf-db-tasks. - Returns structured { ok, reason, applied } outcomes — never throws — so operator tooling can branch on `db-unavailable`, `summary-missing`, `summary-invalid`, `task-not-in-db`, `already-done`. - The reconcile warning text now points at the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:55:30 +02:00
Mikael Hugo	5f245b721d	fix(self-feedback): rewire inline-fix worker as triage queue The inline-fix worker was a partial repair queue — it picked only high/critical+blocking entries plus my recent gap/architecture-defect override and left everything else (medium inconsistencies, janitor gaps, architectural-risks, low-severity gaps) sitting open forever. The requirement-promoter clusters by exact `kind` string and never fires on diverse forge-local entries (every open entry currently has a unique kind), so there is no other sweep that ever touches these. They just accumulate. The point of the worker is triage, not just repair: every open entry should get an eyes-on per session and reach one of three outcomes — fix, promote to requirement, or close as not-of-value with reason. Closing deliberately is a valid, expected outcome. Changes: - `selectInlineFixCandidates` now returns every open forge-local entry, modulo the existing credibility check that re-includes suspect resolutions. Severity and blocking filters are gone; the kind-based override is no longer needed because everything qualifies. - The dispatch prompt is rewritten as a three-way triage protocol (Fix / Promote / Close) with explicit guidance per outcome and explicit prohibition on the `auto-version-bump` evidence kind (which would re-open under the credibility check). - Tests collapse the three filter-coverage tests into a single "selects every open forge-local entry" assertion that exercises the full severity × blocking × kind matrix. Upstream feedback is still excluded — those entries describe behavior in other repos that the inline-fix unit cannot directly repair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:46:24 +02:00
Mikael Hugo	085beb5199	docs(sf-ace): restore parked location + keep ADR cross-references SF's S05/T02 executor moved the doc back to docs/dev/sf-ace-patterns.md while completing the slice (correctly: that was the task's stated deliverable location). The doc is parked under docs/dev/drafts/ because ACE Coder has no active consumer for it; re-park it. Keep the ADR-019 / ADR-020 cross-references the executor added — they are real content improvements over the previous version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:24:12 +02:00
Mikael Hugo	89b52b6011	fix(self-feedback): widen inline-fix candidate selection + drop upstream The inline-fix dispatcher had three blind spots that left forge-local architectural debt rotting in the ledger: 1. Filter required `severity ∈ {high, critical} AND blocking`. Medium `gap:` and `architecture-defect:` entries — describing the exact class of debt the inline-fix unit was built to repair — were dropped on the floor. The forge-local queue currently has 0 high+blocking open entries and 3 architectural gaps, so the old filter would dispatch on nothing local and fall back to upstream. 2. Resolutions were trusted unconditionally. `auto-version-bump` fires on any sf-version bump without verifying the bump contained a fix, silently burying defects. 3. Upstream feedback was merged into the candidate set. Upstream entries describe behavior observed in OTHER repos (e.g. `flow-audit:repeated- milestone-failure` from /srv/infra/apps/centralcloud_ops) — the inline-fix unit edits forge source and cannot repair issues in those other repos. Including them dispatches work the unit cannot perform. Changes to `selectInlineFixCandidates`: - Add kind-based override: entries with `kind` starting with `gap:` or `architecture-defect:` qualify regardless of severity/blocking. - Add resolution credibility check: re-include entries resolved with evidence kind `auto-version-bump`, or with no evidence kind AND no `resolvedReason` narrative at all. Legacy resolutions with a meaningful operator narrative (the historical format) are still trusted. - Drop `readUpstreamSelfFeedback()` from the candidate merge. Upstream stays readable for SELF-FEEDBACK.md rollups and operator review, just not auto-dispatched to inline-fix. Also relax the schedule-e2e readEntries timing assertion from a 100ms threshold to 500ms — the test is a catastrophic-regression guard, not a microbenchmark, and parallel-suite jitter on dev machines routinely adds >100ms even when the underlying read is fast (≤ a few hundred ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:23:57 +02:00
Mikael Hugo	5a2618c05d	fix(auto): re-dispatch on executor refusal instead of pausing The autonomous solver was designed precisely to handle executor refusals (per its own docstring: "the solver role MUST stay on a stable, agentic, refusal-resistant model independent of any per-unit routing choices"), but the refusal handler short-circuited past it and emitted a `blocked` checkpoint, which assessAutonomousSolverTurn unconditionally turns into a `pause` — defeating autonomous mode every time the router selects a capability-mismatched executor. The 1h model-block added in `3f2babb5d` was the right primitive but had no consumer: nothing actually re-dispatched the unit after the model was blocked, so the block only mattered if the operator manually unpaused and retried. This change wires the missing consumer: - Add per-unit `executorRefusalEscalations` counter to solver state plus a `recordExecutorRefusalEscalation` helper. Counter persists across iterations of the same unit and resets on unit change. - On `executor-refused`: block the refusing model and slice-routing entry (unchanged), file self-feedback (unchanged), then synthesize a `continue` checkpoint and return `{ action: "continue" }` directly so the auto loop re-dispatches the unit. selectAndApplyModel will skip the now-blocked model and pick a higher-tier fallback. - Bounded by `MAX_EXECUTOR_REFUSAL_ESCALATIONS=3`. When the budget is exhausted (an entire fallback chain refused on the same unit), fall back to the legacy blocked-and-pause path so the operator can review. - Bypass `assessAutonomousSolverTurn` on the refusal-continue path because its no-op detector would (correctly) reject a continue over a refusal transcript — but here the "no-op" is the whole point: we are explicitly swapping the routed model. Tests cover the new state field's init/persistence/reset semantics and the constant's invariants. Full SF extension suite (1369 tests) passes. Refs: sf-mp3bm6u0-2fskt8 (now fully addressed, not just AC1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:49:51 +02:00
Mikael Hugo	288a2a5fd7	docs(sf-ace): park SF→ACE pattern reference under docs/dev/drafts/ Promotes the .draft stub into a fuller 183-line reference covering six SF patterns (Preferences, PDD, UOK Gates, Notifications, Skills-as- Contracts, Idempotency) with SF source paths and ACE adoption notes. Filed under docs/dev/drafts/ with a STATUS: Draft header — no active consumer yet. SF's own priorities take precedence until ACE Coder maintainers pull on convergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:30:34 +02:00
Mikael Hugo	32cfb6224b	test: migrate node:test imports to vitest and stabilize timing thresholds - Three .test.mjs files now import describe/it from vitest, matching the harness CLAUDE.md mandates for the SF extension suite. - schedule-e2e local readEntries threshold raised 50ms → 100ms with a comment noting full-suite parallelism adds scheduler/filesystem jitter on dev machines (CI threshold unchanged at 200ms). - e2e-smoke "headless new-milestone without --context" timeout raised 10s → 30s so the exit-1 assertion isn't flaky under load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:30:21 +02:00
Mikael Hugo	3f2babb5d1	fix(auto): block refusing executor model temporarily to force escalation on retry When classifyExecutorRefusal detects an executor refusal, the model is now temporarily blocked (1-hour TTL) via the existing blocked-models mechanism. This ensures that on retry — whether automatic or manual — the router skips the refusing model and the tier-escalation path in selectAndApplyModel picks a higher-tier alternative. This satisfies AC1 of self-feedback entry sf-mp3bm6u0-2fskt8. AC2 (refusal pattern detection) was already satisfied by the existing apology-no-tools pattern in classifyExecutorRefusal. Refs: sf-mp3bm6u0-2fskt8	2026-05-13 02:40:41 +02:00
Mikael Hugo	2cad6d54f4	fix(doctor): enrich flow-audit repeated-failure rollup with full diagnostic context The flow-audit repeated-milestone-failure rollup now includes: - Active milestone/unit and session pointer (AC1) - Stale dispatched units (AC2) - Runaway history (AC3) - Over-budget child processes (AC3) This satisfies the acceptance criteria of self-feedback entry sf-mp3ati7u-qqxcyi so operators can use the rollup evidence to repair stale dispatch, missing summary, runaway, or child-process handling without needing to re-run the flow audit manually. Refs: sf-mp3ati7u-qqxcyi	2026-05-13 02:25:29 +02:00
Mikael Hugo	65e195a9fd	feat: Created draft mapping of SF patterns to ACE reference draft SF-Task: S05/T01	2026-05-13 02:01:41 +02:00
Mikael Hugo	1ed505669b	fix(sf-db,autonomous-solver): resolve schema-drift and checkpoint runaway loop - sf-db-schema.js: per-migration transaction boundaries (runMigrationStep) so a late migration failure does not roll back earlier successful ones. Post-migration assertion recreates routing_history if missing. - routing-history.js: catch missing routing_history table at init and latch _dbTableAvailable=false so auto-start does not crash. - autonomous-solver.js: sticky identity guard in appendAutonomousSolverCheckpoint pins to orchestrator's unitType/unitId instead of trusting agent's claim. Emit journal event on identity mismatch. Record mismatchedIdentity diagnostic. Hard cap MAX_CHECKPOINTS_PER_ITERATION=5 in assessAutonomousSolverTurn. - Tests: add v52 DB smoke test with auto-start path; add sticky identity tests (4 cases); add excessive-checkpoint pause test. Fixes: sf-mp36kfqm-rjrzju, sf-mp37kjmo-1mfuru	2026-05-13 01:47:19 +02:00
Mikael Hugo	a49ea1da87	feat(sf/prompts): Phase 4 — cache_control breakpoints at static/dynamic boundary Split reorderForCaching into a structured reorderAndSplitForCaching that returns {before, after} at the semi-static→dynamic section boundary. - prompt-ordering.js: export reorderAndSplitForCaching — returns null if no dynamic sections, otherwise {before: static+semi-static, after: dynamic} - auto.js: import and wire reorderAndSplitForCaching into deps - phases-unit.js: use split function; pass promptParts to runUnit when split succeeds; fall back to flat reorderForCaching when null - run-unit.js: when promptParts is present, send a two-block content array [{type:text, text:before, cache_control:{type:ephemeral}}, {type:text, text:after}] so Anthropic-compatible providers cache the stable prefix - openai-completions.ts: preserve cache_control on text parts in convertMessages; skip maybeAddOpenRouterAnthropicCacheControl if any part already has cache_control Tests: 5 new contract tests for reorderAndSplitForCaching; all 4502 unit tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:36:22 +02:00
Mikael Hugo	3b83d09692	feat(sf/prompts): Phase 3 v2 — migrate milestone+slice builders to composeUnitContext Migrate buildPlanMilestonePrompt, buildValidateMilestonePrompt, buildCompleteMilestonePrompt, buildReplanSlicePrompt, buildResearchSlicePrompt, and renderSlicePrompt (plan-slice + refine-slice) from imperative inlined[] push loops to the v2 composeUnitContext API (manifest-driven, prepend/computed support). Changes: - unit-context-manifest.js: add 7 new ARTIFACT_KEYS (slice-summaries, blocker-summaries, queue, verification-classes, outstanding-items, previous-validation, prior-milestone-summary); update 7 manifests with correct prepend/inline/computed declarations - auto-prompts.js: import composeUnitContext; migrate all 6 builders; remove orphaned old buildValidateMilestonePrompt tail left by partial prior edit - tests: add auto-prompts-phase3.test.mjs with 7 contract tests covering plan-milestone, replan-slice, validate-milestone, and research-slice prompt generation Pre-computation pattern: complex async logic (blocker scan, slice aggregation, verification classes, prior validation) is computed imperatively before composeUnitContext, then returned from resolveArtifact. This preserves parallel execution of other artifacts. buildPlanMilestonePrompt keeps framingBlock imperative: the framing check wraps the composed inlinedContext rather than going inside the composer boundary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:02:48 +02:00
Mikael Hugo	ca5d869e34	feat(prompts): fragment infrastructure + RFC #4782 stub manifests Phase 1 — Fragment infrastructure: - Add {{include:fragment-name}} support to prompt-loader.js - fragmentsDir registered alongside promptsDir/templatesDir - warmCache() now reads prompts/fragments/*.md with 'frg:' prefix - Pre-resolution pass in loadPrompt() resolves {{include:}} before the {{var}} validator (colon is outside validator regex [a-zA-Z0-9_], so unresolved includes are caught as parse errors) - Lazy-load fallback for fragments mirrors existing prompt lazy-load - Create prompts/fragments/working-directory.md (Variant A: full contract including 'Do NOT cd to any other directory') - Create prompts/fragments/working-directory-ops.md (Variant B: ops prompts, no cd restriction) - Replace duplicated 3-line Working Directory boilerplate in 17 prompts with {{include:working-directory}} (12 files) or {{include:working-directory-ops}} (5 ops files) - One fix to Working Directory wording now propagates to all 17 prompts Phase 2 — RFC #4782 stub manifests: - Add deploy, smoke-production, release, rollback, challenge to KNOWN_UNIT_TYPES and UNIT_MANIFESTS in unit-context-manifest.js - All 5 builders already called composeInlinedContext() but returned "" because resolveManifest() found no entry; now they return live content - All 26 unit types now have manifests (resolveManifest returns non-null for every type in KNOWN_UNIT_TYPES) Tests: - 5 new tests in prompt-loader-fragments.test.mjs (include resolution, lazy-load fallback, unknown fragment error, nested var inheritance, variant-B fragment) - Full unit suite: 427 files passed, 4476 tests passed, 0 regressions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 00:30:19 +02:00
Mikael Hugo	55229f6604	fix(auto): split autonomous solver from executor per ADR-0079 - Lock solver model to kimi-k2.6 independent of unit-type router - Executor prompt no longer requires checkpoint tool call - Add dedicated solver pass that reads executor transcript and emits canonical checkpoint - Classify executor refusals as blocker outcomes (already partially implemented) - Classify no-op iterations (continue with zero work) as missing-checkpoint-retry - Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment Fixes sf-mp34nxb6-27zdx7	2026-05-12 23:55:02 +02:00
Mikael Hugo	e2f2cb7e2e	feat: Create Command Behavior Verification Matrix across CLI, TUI, and… SF-Task: S04/T01	2026-05-12 23:01:31 +02:00
Mikael Hugo	f789bf0f40	sf snapshot: uncommitted changes after 53m inactivity	2026-05-12 22:51:31 +02:00
Mikael Hugo	9a678f1449	sf snapshot: uncommitted changes after 270m inactivity	2026-05-12 21:58:31 +02:00
Mikael Hugo	93d547c65e	fix(headless): skip Ask→Build mode gate in SF_HEADLESS mode In headless mode the showConfirm dialog blocks forever since there is no TUI to answer it. The user already consented by calling /next or /autonomous explicitly — the gate adds no value and hangs the run. Add process.env.SF_HEADLESS !== '1' to the gate condition so headless runs bypass it and proceed directly to autonomous execution. Verified: `sf headless --command next` now completes slice S03 (719 526 tokens, 10 tool calls, $0.027) without hanging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 17:28:09 +02:00
Mikael Hugo	d22df007a7	fix(headless): correct log message to show actual command format The log message said '/sf ${command}' but the actual command sent is '/${command}' (without the sf namespace). Fix to match actual dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 17:04:11 +02:00
Mikael Hugo	16db710468	sf snapshot: uncommitted changes after 49m inactivity	2026-05-12 16:45:04 +02:00
Mikael Hugo	0426aafad2	fix(headless): drop /sf prefix so typed commands route through extension dispatch headless.ts was sending `/sf {subcommand} {args}` to the RPC session, but commands are registered without the sf namespace (e.g. 'todo', 'autonomous'). _tryExecuteExtensionCommand parsed commandName='sf', found no match, and the LLM handled the request instead of the typed backend. Fix: send `/{subcommand} {args}` directly — matches what registerSFCommands registers and what the TUI already uses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 15:55:46 +02:00
Mikael Hugo	2bb9cdbeef	feat(scaffold): ADR-022 scaffold profiles (all phases) Add profile-aware scaffold system so SF does not lay down irrelevant templates in infra/ops/docs repos. ## What ships Phase 1 — data model - scaffold-versioning.js: add 'disabled' to VALID_STATES; readScaffoldManifest returns profile field; recordScaffoldApply preserves manifest.profile (fixes roundtrip bug where profile was stripped on every write). - scaffold-constants.js: PROFILES (app/library/infra/docs/minimal as Set<string>) and PROFILE_NAMES exports. Phase 2 — profile-aware drift detection - scaffold-drift.js: disabled bucket in emptyCounts, resolveActiveProfileSet integration, profile param on detectScaffoldDrift/migrateLegacyScaffold. - doc-checker.js: filter to active profile, skip disabled-state files. Phase 3 — auto-detection on first run - scaffold-profiles.js: detectRepoProfile() heuristics (nix→infra, terraform→infra, react→app, node-no-ui→library, docs-only→docs, else→app). - agentic-docs-scaffold.js: reads profile from manifest, auto-detects on first run, persists to manifest, filters SCAFFOLD_FILES to active profile. Phase 4 — migrate command - commands-scaffold-migrate.js: sf scaffold migrate --profile <name> Re-enables pending files entering the new profile; stamps state=disabled (or prunes with --prune) files leaving it; warns on editing/completed files. - commands/handlers/ops.js, commands/catalog.js: registered and tab-completed. Phase 5 — custom profiles + PREFERENCES.md frontmatter - scaffold-profiles.js: readPreferencesProfile(), loadCustomProfileSet() (~/.sf/profiles/<name>.yaml with extends/add/remove), resolveActiveProfileSet() implementing full ADR-022 §6 precedence. - All callers updated to use resolveActiveProfileSet as the single source of truth. Tests: 28 new tests in adr-022-scaffold-profiles.test.mjs — all passing. Pre-existing node:test stubs (3 files) unaffected. ADR: docs/dev/ADR-022-scaffold-profiles.md Misc: triage TODO.md dump into BACKLOG.md (phases-helpers export error T1, /todo triage typed-handler gap T1, structured triage tiers T2, sha-track markdown files T2, cross-repo triage T3). Reset TODO.md to empty template. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 15:28:03 +02:00
Mikael Hugo	ad53b792fb	docs(.agents): add AGENTS.md — directory map and override pattern Documents every folder under .agents/, what it contains, and the override-by-same-name pattern. Explains YOLO as a flag not a mode. is globally ignored but the spec file under .agents/ must be tracked. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:48:36 +02:00
Mikael Hugo	4f04fb4c34	chore(.agents): keep lean — remove default mode files, no modes list .agents/ is an override layer. Default modes (ask/build/autonomous) and default skills come from SF's built-in config. Project files only exist when overriding or adding something project-specific. - Remove modes/ask.md, modes/build.md, modes/autonomous.md (defaults) - Remove enabled.modes from manifest (nothing project-defined) - Policies and skills stay: they are project-specific overrides To override a mode or skill, add a file with the same name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:47:29 +02:00
Mikael Hugo	82d629c3ee	feat(.agents): add autonomous mode; clarify yolo is a flag not a mode - Add modes/autonomous.md — third SF mode (ask/build/autonomous). Describes UOK dispatch loop, bash 120s timeout, fresh-context-per-unit, recovery/runaway-guard, and when to use vs Build. - Add autonomous to enabled.modes in manifest.yaml. - Update policies/yolo.yaml description: YOLO is a flag on Build or Autonomous, not a mode, not a Shift+Tab stop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:45:24 +02:00
Mikael Hugo	8ea4b0745d	fix(.agents): list all 5 skills in manifest.yaml enabled.skills sf-wiki, forge-autonomous-runtime, forge-command-surface, nix-build, and smoke-test are all present in .agents/skills/ and must be declared in enabled.skills per the AGENTS-1 spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 20:12:37 +02:00
Mikael Hugo	a9ebfb4442	fix(skills): move sf-wiki project override to .agents/skills/ (standard location) .agents/skills/ is the documented standard for project-level skill overrides (docs/user-docs/skills.md). .sf/skills/ is also searched but .agents/skills/ is the ecosystem-standard path used across all compatible agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 20:10:21 +02:00
Mikael Hugo	f3d84cd116	.agents: adopt agentsfolder/spec v0.1 as canonical agent configuration Some checks failed CI / detect-changes (push) Has been cancelled Details CI / docs-check (push) Has been cancelled Details CI / lint (push) Has been cancelled Details CI / build (push) Has been cancelled Details CI / integration-tests (push) Has been cancelled Details CI / windows-portability (push) Has been cancelled Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled Details CI / rtk-portability (macos, macos-15) (push) Has been cancelled Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled Details Replaces the fragmented (AGENTS.md + CLAUDE.md + .github/copilot-instructions.md + .sf/STYLE.md + .sf/PRINCIPLES.md + .sf/NON-GOALS.md) surface with a single canonical .agents/ tree per https://github.com/agentsfolder/spec. Structure: .agents/manifest.yaml spec metadata + defaults + project info .agents/prompts/ base.md project-agnostic base prompt project.md SF-specific: purpose-first, DB-first, build pipeline, Ask/Build/YOLO model snippets/{style,principles,non-goals}.md short pointers into .sf/{STYLE,PRINCIPLES, NON-GOALS}.md for composition .agents/modes/{ask,build}.md YAML front matter + human-readable body .agents/policies/{default-safe,yolo}.yaml conservative default + YOLO override .agents/skills/.gitkeep empty per spec — SF's own skills not yet migrated to agentskills.io format .agents/scopes/.gitkeep single-tree, no scopes yet .agents/profiles/.gitkeep no overlays yet .agents/schemas/.gitkeep generated by validators .agents/state/.gitignore excludes state.yaml from VCS per spec Status: spec is pre-1.0 (specVersion 0.1.0 pinned). No agent runtime currently reads .agents/ — this is structural adoption ahead of ecosystem support. Legacy files (AGENTS.md, CLAUDE.md, etc.) kept during the transition; .agents/ is now the canonical surface and they will eventually point here. This is the reference template; centralcloud/infra, operations-memory, oncall-mobile-android to follow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 20:04:35 +02:00
Mikael Hugo	edd0eb22ac	feat(skills): add project-level sf-wiki skill override with UPPERCASE convention .sf/skills/ is the project-local skill override directory. This override inherits all sf-wiki defaults and adds one project-specific rule: wiki pages use UPPERCASE filenames (INDEX.md, ARCHITECTURE.md, etc.) to match the .sf/ operational file convention (DECISIONS.md, KNOWLEDGE.md, etc.). The built-in src/resources/skills/sf-wiki/SKILL.md stays generic (lowercase). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:54:18 +02:00
Mikael Hugo	385cc8a18b	revert(skills): restore lowercase defaults in sf-wiki SKILL.md sf-wiki is a built-in read-only skill — its page name defaults must stay generic (lowercase). The uppercase convention is this repo's project-level choice, documented in system.md and the wiki itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:52:15 +02:00
Mikael Hugo	0d187e53d7	chore(wiki): rename wiki pages to UPPERCASE to match .sf/ convention All .sf/ operational files use UPPERCASE (DECISIONS.md, KNOWLEDGE.md, etc.). Wiki pages now follow the same convention: INDEX.md, ARCHITECTURE.md, WORKFLOWS.md, SUBSYSTEMS.md, GLOSSARY.md. Also updates sf-wiki SKILL.md and system.md prompt references. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:50:06 +02:00
Mikael Hugo	eacbbaac82	TODO: simplify md-tracking — drop snapshot blob, accept mid-edit corner Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Final settled design: sha + git ref only, no DB content snapshots at all. The mid-edit case (file observed dirty) loses the ability to reconstruct the intermediate working-tree state, but the change- detection signal is preserved and the operator can commit first if intermediate fidelity matters. Trades a corner-case fidelity loss for a much simpler schema and no DB-vs-disk content duplication. Git remains the only version store; the DB row is a pure "where I left off" pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:49:25 +02:00
Mikael Hugo	76923afb91	TODO: md-tracking needs a version reference, not just a content sha Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Without storing snapshots we lose the ability to diff against "what SF last saw". The fix is hybrid: store the git commit SHA1 that contained the observed content (cheap, no DB blob), and only fall back to a gzipped snapshot when the file was observed with uncommitted changes (no git ref exists for that exact content). For ".sf/-generated, untracked, in .gitignore" the right answer is to not track them in this table at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:46:38 +02:00
Mikael Hugo	296054b1d4	TODO: drop snapshot blob from md-tracking; use git for diff source Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Per follow-up: SF generates many of these .md files itself (.sf/wiki/, .sf/milestones//.md, docs/plans/**), so storing gzipped snapshots in the DB would duplicate disk + git for no benefit. Simpler design: store only the sha + meta in sf.db; compute diffs on demand against `git show HEAD:<path>`. Naturally handles both "working-tree edit not yet committed" and "another agent committed while SF wasn't running". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:46:06 +02:00
Mikael Hugo	faecdc828c	TODO: generalise sha-tracking from milestones to all source-of-truth .md Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Per follow-up: not just .sf/milestones/*/.md but the broader set of markdown files that SF (or humans) treat as authoritative — AGENTS.md, .github/copilot-instructions.md, .sf/wiki/, docs/adr/, docs/plans/**, and root-level meta files. Explicit out-of-scope list: TODO.md (reset every cycle by triage), CHANGELOG.md / BUILD_PLAN.md (append-only by design), vendored or generated content. Tracking those would just be noise. Spec includes a tracked_md_files schema, the walk/diff/surface flow, and an honest accounting of storage cost (~40 bytes per file + optional gzipped snapshot). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:45:39 +02:00
Mikael Hugo	902be6d1de	TODO: SF should sha-track milestone files and diff on change Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Captures a real bug class observed during today's session: nothing notices when a milestone file (CONTEXT.md, ROADMAP.md, slice PLAN.md, etc.) is edited out of band — by a human, another agent, or a git pull. SF keeps using the cached state and drifts. Wanted: per-file sha tracking in sf.db, diff surface on change, + hooks for accept/reject/import/archive. Storage cost negligible. Useful in concert with the cross-repo triage and slash-command routing gaps already in this TODO.md — together they close most of the "unattended SF actually works" surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:45:05 +02:00
Mikael Hugo	41b7842fd8	TODO: cross-repo triage + slash-command routing + structured tiers (redo) Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Previous commit (`1fb4b9882`) captured only the reset and lost my intended additions due to a Read/Write race. Re-applying the four feature requests from today's dogfooding session: - Cross-repo `triage-all-repos` (real fix for the "many TODO.md files" surface area — single tool, per-repo SF dbs, unified read-only aggregation view). - Slash-command routing fix (`/todo triage` is currently re-implemented by the agent's LLM, bypassing the typed backend; patches to commands-todo.js were silently inert). - Structured tier/priority per triage item (today tiers exist only in LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for "promote Tier 1 items"). - Phases-helpers stale-export error that fires on every SF run; needs either the missing export restored or a test that catches it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:34:49 +02:00
Mikael Hugo	1fb4b98820	TODO: cross-repo triage + slash-command routing + structured tiers Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Four feature requests captured from today's dogfooding session: - Cross-repo `triage-all-repos` (real fix for the "many TODO.md files" surface area — single tool, per-repo SF dbs, unified read-only aggregation view). - Slash-command routing fix (`/todo triage` is currently re-implemented by the agent's LLM, bypassing the typed backend; patches to commands-todo.js were silently inert). - Structured tier/priority per triage item (today tiers exist only in LLM-prose appended to BUILD_PLAN.md; no parser-friendly field for "promote Tier 1 items"). - Phases-helpers stale-export error that fires on every SF run; needs either the missing export restored or a test that catches it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:34:07 +02:00
Mikael Hugo	b818ae2c5a	docs(wiki): add subsystems.md and glossary.md wiki pages Complete the standard wiki page set from sf-wiki SKILL.md: - subsystems.md: table of all subsystems with path, purpose, tests - glossary.md: project-specific terms (ADR, UOK, PDD, YOLO, wiki, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:27:01 +02:00
Mikael Hugo	e679478d1b	feat(wiki): wire .sf/wiki/ as tracked context source - auto-bootstrap-context.js: scan .sf/wiki/*.md in collectAutoBootstrapFiles so wiki pages load as priority context in headless autonomous bootstrap - headless-context.ts: same fix for the TS bootstrap path - system-context.js: loadWikiBlock already existed and was wired into fullSystem; add .sf/wiki/ to Tier 1 escalation policy lookup sources - system.md: add wiki/ to .sf/ directory structure; add Conventions entry explaining wiki is tracked in git (hand edits persist) and injected automatically when present - git-runtime-patterns.js: do NOT gitignore .sf/wiki/ — wiki pages are tracked like DECISIONS.md so hand edits survive commits and clones - .sf/wiki/: seed index.md, architecture.md, workflows.md for this repo Wiki filenames follow sf-wiki SKILL.md convention: lowercase (index.md, architecture.md, workflows.md, subsystems.md, glossary.md). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:24:23 +02:00
Mikael Hugo	3e652a9fd6	TODO: triage should escalate Tier 1 items to real milestones Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Today's triage run confirmed the manual `/todo triage` workflow works, but it stops at tier-listing items in BUILD_PLAN.md — doesn't scaffold .sf/milestones/MNNN/ dirs for the Tier 1 ones. That's the gap that needs closing for the autonomous flow to actually create milestones from raw TODO dumps. Also captures the non-fatal phases-helpers.js extension load error that appeared at the top of the triage run output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:15:33 +02:00
Mikael Hugo	ca7368e5f1	fix(bash): add 120s default timeout to prevent autonomous mode hangs - Add BUILT_IN_DEFAULT_TIMEOUT_SECS = 120 constant to bash tool - Compute effectiveTimeout = timeout ?? resolvedDefaultTimeout so LLM calls without a timeout get the 120s guard automatically - Add defaultTimeoutSeconds? to BashToolOptions for override at creation - Dynamic bashSchemaWithDefault describes the actual default in the LLM tool description, improving model awareness - Add BashSettings interface + getBashDefaultTimeoutSeconds() to SettingsManager so users can override or disable via settings.json - Wire defaultTimeoutSeconds into agent-session.ts _buildRuntime() Root cause: npx sf --help triggered npm package download, hanging for 4+ minutes without timeout, consuming entire autonomous run budget. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:12:33 +02:00
Mikael Hugo	7ef58422b1	TODO: feature requests for batch backlog ingestion + probe-based resolution Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Real dogfood for the auto-triage feature: this is the unstructured dump that the autonomous cycle should pick up and process into proper backlog items the next time it runs. Until auto-triage is wired up, the contents serve as a written spec for what's needed. Two flagship features: - Auto-triage TODO.md on each autonomous cycle. `commands-todo.js` already implements `/todo triage` (manual). Wire it to the autonomous orchestrator and skip when TODO.md == _EMPTY_TODO. - When the LLM would ask a clarifying question, replace with parallel combatant + partner probes (adversarial-challenge + collaborative- research) and only fall back to asking a human if probes diverge AND interactive mode is available. This unblocks unattended `headless new-milestone` (the gap that blocked batch backlog ingestion today). Plus five smaller items (headless milestone stall fix, bulk import-roadmap, TTY-free plan list, hand-authorable milestone scaffold, discoverable --answers schema) carried over from the centralcloud-ops SF-IMPROVEMENTS.md observations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:09:26 +02:00
Mikael Hugo	4e5fc12e81	feat(sf): fix gate health — import, DB fallback, and enrich status uok Three follow-up fixes from S03/T04: 1. gate-runner.js: add missing getDistinctGateIds import from sf-db.js. UokGateRunner.getHealthSummary() called it when registry was empty but it was never imported — runtime ReferenceError in headless contexts. 2. sf-db-gates.js: getDistinctGateIds + getGateRunStats fall back to the quality_gates DB table when no trace events are found (e.g. after trace file rotation). Ensures gate health survives trace cleanup. 3. headless-uok-status.ts: replace generic Type column with real Scope (task/slice/milestone) from quality_gates DB, and show actual Last Evaluated timestamp from DB even when outside the 24h stats window. Tests updated to match (21 pass). Closes backlog items: bl-gate-runner-import-bug, bl-gate-stats-trace-vs-db, bl-uok-status-enrich. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 18:47:42 +02:00
Mikael Hugo	797db16ae8	feat(sf): S03/T04 — add UOK gate health to sf headless status uok Adds a new `sf headless status uok` subcommand that queries gate-run stats and circuit-breaker state from sf.db and formats them as a markdown table or JSON (--json flag). - src/headless-uok-status.ts: handler that loads sf-db-gates directly (avoids the unimported getDistinctGateIds in gate-runner) - src/headless.ts: bypass RPC, route 'status uok' to handler - src/help-text.ts: document the new subcommand - tests/headless-uok-status.test.mjs: 19 node:test coverage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 18:31:03 +02:00
Mikael Hugo	4132ecc1db	feat(sf): S03/T03 — wire OutcomeLearningGate into adaptive verification policy Adds adaptive-verification-policy.js which reads OutcomeLearningGate trace events from the last 24h and adjusts verification_max_retries / verification_auto_fix in project preferences: - >60% verification/artifact/execution failures → reduce retries to 1, disable auto-fix - 0% failures across ≥5 samples → bump retries (capped at 3) - all other cases → no change (returns null) Wires into auto-verification.js after OutcomeLearningGate runs when outcomeLearning flag is enabled. Includes 12 node:test tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 17:40:22 +02:00
Mikael Hugo	7b225696cc	feat(sf): add cross-slice and milestone integrity checks to post-execution checks - Add checkCrossSliceConsistency() to detect key_file conflicts across slices - Add checkMilestoneIntegrity() to verify completed slices have summaries and no active requirements are orphaned - Extend runPostExecutionChecks() signature with optional milestoneId and allSliceTasks parameters - Wire cross-slice task gathering into auto-verification.js call site - Add comprehensive node:test suite for both new checks	2026-05-11 17:22:11 +02:00

1 2 3 4 5 ...

4385 commits