singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	78ea18dbee	feat(native): expose unified edit module with native ops Adds applyEdits, applyWorkspaceEdit, replaceSymbol, insertAroundSymbol, and watchTree to @singularity-forge/native via the new ./edit subpath. - applyEdits / applyWorkspaceEdit: LSP-shaped TextEdit arrays applied via byte-level splice + atomic rename, two-phase commit across files. - replaceSymbol / insertAroundSymbol: tree-sitter symbol resolution via forge-ast, TS/JS/TSX support; v1 replaces whole declaration. - watchTree: notify-rs recursive watcher with native globset ignore + JS EventEmitter wrapper (drops chokidar dep). Rust impl in rust-engine/crates/engine/src/{edit,symbol,watch}.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:33:06 +02:00
Mikael Hugo	5f52680285	chore: snapshot in-flight work (mcp graph refactor, native edit module, misc) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:31:44 +02:00
Mikael Hugo	f4dd66d4ed	fix(sf): cap sift warmup with timeout(1) wall-clock wrapper Orphaned sift warmups can spin past --retriever-timeout-ms (a per-page timeout, not wall-clock) and burn CPU indefinitely after the launcher exits — observed a 95-min, 98% CPU orphan. Wrap the detached spawn in timeout(1) / gtimeout when present (SIGTERM at the cap, SIGKILL 10s later); fall back to raw spawn elsewhere. Default cap 1800s, override via SF_SIFT_HARD_TIMEOUT_SEC, disable via SF_SIFT_HARD_TIMEOUT_DISABLE=1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:29:02 +02:00
Mikael Hugo	f5ea1cb6c0	feat: Validated R116: product-audit fires at three phase transition poi… SF-Task: S01/T02	2026-05-02 07:35:36 +02:00
Mikael Hugo	a4ae2feaac	sf snapshot: pre-dispatch, uncommitted changes after 30m inactivity	2026-05-02 07:26:07 +02:00
Mikael Hugo	8ed0c4078e	chore: commit headless follow-up changes	2026-05-02 06:55:12 +02:00
Mikael Hugo	aed104c81f	fix: guard advisor fallback session model	2026-05-02 06:39:23 +02:00
Mikael Hugo	6f6ace3da6	chore: Node 24.15 floor + modernization round-up - engines.node: >=24.15.0 across all 23 package.json (root + 8 workspace + studio + web + pkg + vscode-extension + 11 SF extension manifests) - CI workflows pinned to node-version: '24.15' (16 sites) - Dockerfile -> node:24.15-slim - .nvmrc / .node-version -> 24.15.0 - Refactored worktree-cli.ts and headless-query.ts to use import.meta.filename instead of fileURLToPath(import.meta.url) - exec.ts simplified with AbortSignal.any + spawn signal/killSignal - Picks up Crush's biome.json + AGENTS.md doc cleanup in same pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:37:36 +02:00
Mikael Hugo	d9c848132a	chore: CI workflows, package.json updates, test fixes, docs cleanup 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:30:45 +02:00
Mikael Hugo	f00af5b67f	chore: remove last vitest exclude — lsp-integration already converted 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:22:59 +02:00
Mikael Hugo	a920164a04	chore: worktree e2e test update 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:21:09 +02:00
Mikael Hugo	302888e3d3	chore: test fixes, dep updates, lockfile sync 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:20:44 +02:00
Mikael Hugo	6fcf61ba0e	chore: lockfile update and vitest config cleanup 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:19:52 +02:00
Mikael Hugo	6744f6d254	chore: update version and changelog scripts 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:19:16 +02:00
Mikael Hugo	7106a04951	chore: remaining studio and web updates 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:18:50 +02:00
Mikael Hugo	d73a73d7f3	chore: node 24 native APIs, import.meta.dirname, parsers rename, dep updates - Replace fileURLToPath(import.meta.url) with import.meta.dirname across scripts and extensions - Rename parsers-legacy.ts → parsers.ts - Remove deleted plan/spec docs (cicd-pipeline) - Update package.json engines and deps across workspace packages - Update web/package-lock.json 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:18:25 +02:00
Mikael Hugo	980772cc90	refactor: migrate from better-sqlite3 to node:sqlite, npm glob to node:fs Since Node >= 24 is the minimum engine, remove the better-sqlite3 fallback chain from sf-db.ts, unit-ownership.ts, and cli-stats.ts. Use DatabaseSync from node:sqlite directly. Also replace the `glob` npm package with built-in node:fs/promises.glob and node:fs.globSync in pi-coding-agent LSP utils. - Remove createRequire boilerplate and suppressSqliteWarning helper - Simplify loadProvider() and openRawDb() - Net -177 lines of fallback/middleware code 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 06:13:57 +02:00
Mikael Hugo	040bdf4eb8	fix(sf): simplify parallel-merge, remove debug logs from state - Simplify parallel-merge.ts error handling - Remove console.log debug statements from state.ts deriveState 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 05:48:13 +02:00
Mikael Hugo	37f1028fe9	test: fix mcp-server imports, regex patterns, and add sqlite fallback in parallel-merge	2026-05-02 05:46:32 +02:00
Mikael Hugo	2be52e28a3	test: convert ci_monitor and linux-ready to vitest, add vectordrive to include	2026-05-02 05:45:40 +02:00
Mikael Hugo	449d0ca878	test: convert remaining standalone tests to vitest, remove debug logs, fix parser fallback	2026-05-02 05:43:32 +02:00
Mikael Hugo	ba5ecfc050	fix: stalled-tool-recovery test wrap in describe/it, minor cleanup - Wrap bare test blocks in describe/it for vitest compatibility - Clean up vitest.config.ts 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 05:41:39 +02:00
Mikael Hugo	b6358c1c14	test: commit current vitest fixes	2026-05-02 05:39:38 +02:00
Mikael Hugo	0e769dbf13	test: include vitest test import	2026-05-02 05:38:37 +02:00
Mikael Hugo	df03312fa5	test: stabilize vitest compatibility	2026-05-02 05:36:57 +02:00
Mikael Hugo	7dd59ad70d	test: enable 7 more converted vitest tests and fix worktree-nested-git slice size	2026-05-02 05:35:42 +02:00
Mikael Hugo	9ad818d4a0	test: enable 7 converted vitest tests previously in exclude list	2026-05-02 05:32:32 +02:00
Mikael Hugo	0682fbc32a	test: remove debug logs, fix loop.ts logging, and enable converted vitest tests	2026-05-02 05:13:14 +02:00
Mikael Hugo	3ddb8c84e0	chore: commit current worktree state	2026-05-02 05:11:03 +02:00
Mikael Hugo	e44237e526	test: final vitest API migration fixes across all packages and extensions	2026-05-02 04:49:34 +02:00
Mikael Hugo	5cf94c296e	test: complete vitest mock API fixes for callCount and calls access	2026-05-02 04:47:41 +02:00
Mikael Hugo	1de5d5456a	chore: complete vitest migration for remaining packages and API calls - Convert remaining node:test → vitest imports in packages/* and studio/* - Fix mock.callCount() → mock.callCount property access for vitest compat - Fix mock.calls[N].arguments → mock.calls[N] for vitest compat - Update tsconfig.extensions.json to exclude test files from tsc - Harden migrate-to-vitest-all.mjs regex for single quotes and optional semicolons	2026-05-02 04:46:11 +02:00
Mikael Hugo	b62f7b20ec	fix: convert node:test API calls to vitest equivalents - t.after() → afterEach() with import injection - t.before() → beforeEach() with import injection - t.test() → test() (flatten subtests) - t.skip() → return with skip comment - Fix vitest.config.ts poolOptions deprecation for Vitest 4 - Run fix-vitest-api.mjs across 108 affected test files 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 04:42:38 +02:00
Mikael Hugo	01d8f2fad6	fix(pi-ai): drop pre-5.3 codex models from generated registry Remove gpt-5.1 and gpt-5.2 variants from openai-codex-responses. Keep gpt-5.3+, gpt-5.4, and the newly-added gpt-5.5.	2026-05-02 04:41:06 +02:00
Mikael Hugo	d883f885e9	test(sf): advisor_allowed_providers dispatch gating - Add behavioural tests for isProviderAllowedForAdvisor wired into selectAndApplyModel for subagent unit types. - Verify non-subagent units are unaffected by the advisor allowlist. - Add static source analysis guard confirming the check exists. Assisted-by: Kimi Code CLI	2026-05-02 04:40:08 +02:00
Mikael Hugo	59aaf3dcf3	chore: migrate test suite from node:test to vitest Add vitest.config.ts with forks pool, v8 coverage, and package aliases. Run migrate-to-vitest.mjs to replace `from "node:test"` imports with `from 'vitest'` across 749 test files, converting mock.fn→vi.fn and mock.timers→vi fake timers where needed. 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 04:37:33 +02:00
Mikael Hugo	a38e72497f	fix(sf): reorder guards after dispatch, plan-gate in guards, search provider fixes - Move guards phase after dispatch in dev path so unitType/unitId are available for plan-gate validation - Relocate UOK plan-gate from runDispatch into runGuards with getSliceTaskCounts first-task-of-slice check - Rename runLegacyAutoLoop → autoLoop in startAuto call sites - Add plan quality gate in _deriveStateImpl via getSlicePlanBlockingIssue - Clear path cache in invalidateStateCache - Deprioritise minimax in search provider fallback ordering - Fix native-search Anthropic heuristic to exclude copilot/minimax/kimi clones while still matching claude-* models - Add releaseIfIdle to CodexAppServerClient for clean short-lived process exit - Fix nested codex error message parsing - Update search provider tests to clear minimax env vars - Add native parser zero-task fallback in parsePlan 💘 Generated with Crush Assisted-by: GLM-5.1 via Crush <crush@charm.land>	2026-05-02 04:35:26 +02:00
Mikael Hugo	733a3b0f6e	feat(pi-ai): codex provider integration and auto-loop rename fix - Add codex-app-server-client for Codex app server communication - Update openai-codex-responses provider integration - Fix auto.ts to use runLegacyAutoLoop post-UOK-refactor - Add advisor_allowed_providers preference support - Fix slice plan blocking issue check in auto-recovery	2026-05-02 04:02:10 +02:00
Mikael Hugo	97bbbb58d1	fix(sf): fix test failures — session guard, runLegacyLoop alias, state quality gate - run-unit.ts: do NOT clear isSessionSwitchInFlight on timeout; let the dangling newSession .finally() clear it via generation check. This fixes 'runUnit keeps the session-switch guard across a late newSession settlement'. - auto.ts: use `runLegacyLoop: autoLoop` (not runLegacyAutoLoop) — autoLoop already defaults to legacy-direct dispatch contract. Fixes source-inspection test that expects the literal text 'runLegacyLoop: autoLoop'. - state.ts: remove over-strict plan quality check from state derivation so minimal plans (no review sections) don't block task dispatch. - auto-recovery.ts, auto-timers.ts: minor cleanup from agent sweep. - packages/pi-ai: github-copilot.ts OAuth helper + index.ts export wiring. - openai-codex.ts: drop stale PKCE residuals after simplification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 03:51:12 +02:00
Mikael Hugo	0266ca3ec8	docs(sf): wire parentTrace into advisory-partner dispatch Adds a Dispatch Pattern subsection showing the parentTrace shape for advisory review. For advisory, the trace is the planner's reasoning trail (alternatives considered, untested assumptions, explicit out-of-scope) — not tool calls. This lets the advisory reviewer catch the gap between what the planner thought and what the artefact says, which is exactly what advisory review exists to catch.	2026-05-02 03:45:37 +02:00
Mikael Hugo	2fd0f15c98	docs(sf): wire parentTrace into code-review, requesting-code-review, validate-milestone Closes the loop on parent-trace pass-through (subagent dispatch wiring + helper + test were landed earlier). The dispatch tool supports parentTrace at TaskItem / ChainItem / batch level; until the canonical review skills teach the LLM to PASS it, the feature is dead code in practice. - code-review/SKILL.md Phase 2: shows the 5-lens parallel review swarm dispatch with parentTrace at the batch level. Reviewer can audit what the implementer actually did, not just the prose summary. - requesting-code-review/SKILL.md Local Review Loop: shows the advocate + challenger-A + challenger-B dispatch with parentTrace and adds a hard rule that all three must receive it. Specifically calls out that the advocate is the most likely to wave away an objection the trace contradicts — passing the trace forces engagement. - prompts/validate-milestone.md Step 1: passes a slice-claim summary (one bullet per slice, with SUMMARY path) as parentTrace to the three validation reviewers, so they audit slice claims against artifacts. PDD packet (inline; pure prose docs, no code change): - Purpose: review skills actually USE the parentTrace plumbing instead of dispatching reviewers blind to what the parent did. - Consumer: code-review (every slice/PR review), requesting-code-review (every external review request), validate-milestone (every milestone close). - Contract: each skill's dispatch example includes parentTrace; the rule text instructs the LLM to assemble its own tool-call summary. - Evidence: grep confirms `parentTrace` in all three files; npm run copy-resources propagated to dist; typecheck:extensions exits 0. - Non-goals: not changing the verifier prompt assembly (already inherits from composeTaskWithParentTrace's embedded instructions); not changing agent definitions; not auto-capturing the trace (parent agent decides what's relevant). - Invariants: existing dispatch examples preserved with parentTrace added, not replacing the original; no agent type changes. - Assumptions: the parent LLM's context contains the tool-call history it needs to assemble parentTrace; the dispatch tool routes the field through unchanged (verified by parent-trace.test.ts).	2026-05-02 03:44:02 +02:00
Mikael Hugo	fc1ed49d72	test+docs(sf): parent-trace test + dispatching-subagents skill doc Follows up the parent-trace dispatch wiring (bundled into `bc9cf4fef` + `2508822b8`). Adds: - src/resources/extensions/subagent/tests/parent-trace.test.ts — 7 cases covering the composeTaskWithParentTrace helper: undefined/empty/whitespace pass-through, tag wrapping, task-after-trace ordering, content trimming, embedded verifier instructions ("hedge words", "tool errors"). - src/resources/extensions/subagent/index.ts — exports composeTaskWithParentTrace so the test can import it. - skills/dispatching-subagents — new "Parent trace (for verifier/review subagents)" subsection documents the field at TaskItem / ChainItem / batch level, the per-task override, and the chain (step 0 only) and debate (round 1 only) behaviour. PDD packet (inline; small follow-up to the architectural change): - Purpose: parent-trace plumbing has a falsifiable test and is documented in the canonical dispatching-subagents skill so callers know how to use it. - Consumer: the dispatching-subagents skill (loaded by every agent that calls the subagent tool); the test (covers regression). - Contract: 7 test cases pass; SKILL.md contains the documented field at three schema levels with the override and per-mode behaviour notes. - Evidence: - tests/parent-trace.test.ts → 7/7 pass via the SF resolve-ts loader - npm run typecheck:extensions exits 0 - All 35 subagent suite tests pass - Non-goals: not changing the dispatch wiring (already in); not adding parent-trace handling to background jobs (separate slice if needed). - Invariants (safety only — sync helper + pure prose docs): - composeTaskWithParentTrace returns task unchanged when trace is empty. - The original task always appears after the closing tag. - Trimmed content is what gets injected, not the raw padded input. - Assumptions: tests load TS via the resolve-ts.mjs hook (standard SF pattern); skills load SKILL.md from dist via copy-resources.	2026-05-02 03:29:56 +02:00
Mikael Hugo	2508822b8f	refactor(pi-ai): simplify Codex OAuth + minor fixes across pi-ai and sf - openai-codex.ts: replace hand-rolled PKCE flow with simple read of ~/.codex/auth.json written by the real codex CLI after user authentication. Removes ~250 lines of local callback server + browser dance code. - openai-codex-responses.ts: minor residual cleanup - openai-completions.ts: drop remaining `as any` stream_options cast - anthropic-shared.ts: use `unknown` cast on thinkingNoBudget path - pi-coding-agent/extensions/types.ts: minor type addition - db-tools.ts: explicit AgentToolResult return type on execute handlers - requesting-code-review/SKILL.md: prompt wording cleanup - subagent/index.ts: capability registration wiring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 03:25:39 +02:00
Mikael Hugo	bc9cf4fef3	chore(sf): commit remaining uncommitted improvements - anthropic-shared.ts: replace `as any` cast on thinkingNoBudget path with `as unknown as Record<string, unknown>` for auditability; remove `as any` on server_tool_use block (SDK type is now correct) - openai-completions.ts: drop residual `as any` casts after SDK type update - db-tools.ts: add explicit AgentToolResult return type annotation on execute handlers to resolve implicit-any lint - requesting-code-review/SKILL.md: update review skill prompt - subagent/index.ts: wire subagent capability registration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 03:22:52 +02:00
Mikael Hugo	2846c296ee	chore(pi-ai): typecheck cleanup, empty-catch comments, OAuth audit notes - package.json: add 'typecheck' script (build:pi + tsc --noEmit) so pi-ai and pi-coding-agent typecheck under the same command surface SF uses. - anthropic-shared.ts: replace 'as any' casts with proper Anthropic SDK types (ServerToolUseBlockParam, WebSearchToolResultBlockParam, CacheControlEphemeral). The cache_control variant is documented inline so the cast is auditable. - openai-completions.ts: drop the 'as any' on stream_options — the type system can verify the assignment now. - openai-codex-responses.ts, package-manager.ts, skills.ts: annotate the three remaining empty catches with one-line WHY comments (best-effort cleanup, malformed ignore files, partial directory traversal). Empty catch with no rationale is an SF012 anti-pattern; with rationale it is a deliberate fallback. - oauth/github-copilot.ts, oauth/openai-codex.ts: add UPSTREAM AUDIT blocks documenting why these hand-rolled OAuth flows stay hand-rolled rather than delegating to @octokit/auth-oauth-device or @openai/codex. AbortSignal coverage and provider-specific surface area are the gating concerns; re-audit triggers are named.	2026-05-02 03:20:25 +02:00
Mikael Hugo	e6a2ec0a8f	fix(sf): guard auto-loop against missing DB and missing basePath Two small defensive fixes in the auto-loop that surfaced when running sf in degraded environments (no .sf/sf.db yet, or unset basePath): - phases.ts: gate planning-flow gate behind isDbAvailable() so a missing or not-yet-initialized DB does not throw inside the gate runner. - run-unit.ts: skip process.chdir when s.basePath is falsy. The original guard compared cwd to an empty string, which always failed on the first unit of a fresh runtime root. Both are conservative — preserve existing behaviour when DB and basePath are present.	2026-05-02 03:20:13 +02:00
Mikael Hugo	082526c0e4	docs(sf): finish PDD v2 propagation into Purpose Gate, requesting/receiving review Tail-end of the PDD v2 work (Assumptions field + safety/liveness split + machine-executable Evidence). Three documents that still referenced v1's 4-field Purpose Gate are updated to the full 8-field PDD packet: - docs/SPEC_FIRST_TDD.md — Purpose Gate now lists all 8 fields with the Assumptions and Failure-boundary additions inline. - skills/requesting-code-review — replaces "Purpose & Consumer" section with "PDD packet (all 8 fields)" restated verbatim from .sf/active/{unit-id}/pdd.md. Falsifier and Scope-defence sections clarified vs Failure-boundary and Non-goals to remove overlap. - skills/receiving-code-review — Purpose Gate criterion updated to demand the full PDD packet with machine-executable Evidence, not just Purpose/Consumer/Value-at-risk. PDD packet (inline): - Purpose: every artefact that references "Purpose Gate" agrees on the same 8-field definition; reviewers and reviewees read the same packet. - Consumer: spec-first-tdd, requesting-code-review, receiving-code-review. - Contract: all three documents list the same 8 fields with the same Assumptions / safety+liveness / machine-executable-Evidence wording. - Evidence: grep confirms PDD packet references in all three; typecheck:extensions exits 0. - Non-goals: no edits to the PDD skill itself (already v2); no edits to other skills referencing v1 Purpose Gate beyond these three (they don't exist). - Invariants: existing review-loop sections preserved; only Purpose-Gate- related sections rewritten. - Assumptions: PDD v2 SKILL.md is the canonical source of field definitions; these three documents are projections of it.	2026-05-02 03:20:06 +02:00
Mikael Hugo	b48e6d5dd7	docs(sf): verdict discipline, subagent prompt audit, read-only researcher, debugging rationalizations, trace format Step 2 + scan-and-improve from the Piebald-AI/claude-code-system-prompts pattern analysis. Five files, prose-only edits, no code changes. - prompts/gate-evaluate.md — Verdict Discipline section: omitted is not a hedge. Each omitted verdict needs a reason; unexplained omitted is treated as failed-to-decide and re-dispatched. - skills/dispatching-subagents — Subagent Prompt Audit: before dispatch, audit for smuggled user-questions, action-class delegation, scope creep, and tool vs prompt mismatch. After return, scan for hedge words, glossed-over tool errors, and self-reports without traces. - skills/researcher — Read-only discipline block: closes the bash redirect / heredoc back-door. Researcher does not write files, DB rows, git, or packages; the report is the only output, and write-requires findings are surfaced for parent dispatch rather than performed in-skill. - skills/systematic-debugging — Recognize Your Own Rationalizations: names the debugging-specific failure modes ("error message obviously says X", "small diff can't be the cause", "test was probably flaky"). Adds Command/ Output trace format requirement to Phase 4 verification. - skills/spec-first-tdd — Adds Command/Output trace format requirement to the Evidence section. PDD packet (inline; prose-only edit, all five additions): - Purpose: harden five SF skills/prompts so loaded text catches rationalizations, closes the read-only back-door, and requires falsifiable verdicts/traces. - Consumer: every gate evaluation, subagent dispatch, research run, debugging session, and TDD slice. - Contract: SKILL/prompt text contains the new sections at predictable anchor points, grep-able by the section headings used. - Evidence: grep-confirmed presence of "Verdict Discipline", "Subagent Prompt Audit", "<read_only_discipline>", "Recognize Your Own Rationalizations", "Trace format" in their respective files; typecheck:extensions exits 0; copy-resources propagated to dist. - Non-goals: no edits to ask-gate.ts, no transport changes (parent-transcript pass-through deferred); no edits to receiving-code-review/requesting-code- review (already strong post-PDD-v2). - Invariants: existing sections preserved; only additions; frontmatter unchanged. - Assumptions: skills loaded from dist via copy-resources; section text is injected verbatim into agent context; SF voice (paraphrased patterns, not copy-pasted from Anthropic's bytes).	2026-05-02 03:15:35 +02:00
Mikael Hugo	ef325d7b49	docs(sf): self-awareness + adversarial probe + trace format in verify/review skills Adds three patterns from Piebald-AI/claude-code-system-prompts (extracted from the public Claude Code npm bundle) to SF's two completion-gate skills: - "You are bad at this" self-awareness sections at the top of finish-and-verify and code-review — names the LLM-specific failure modes (read-don't-run, trust-self-reports, hedge-when-uncertain, fooled-by-AI-slop) instead of the generic "be thorough" framing. - Rationalization-callouts that name the exact excuses the agent reaches for ("probably fine", "tests already pass", "looks correct based on my reading") and invert each with a counter-instruction. - Mandatory adversarial probe before slice-done / Lens 1 APPROVE: at least one boundary / idempotency / concurrency / orphan-reference probe with documented result, even when behaviour was correct. - Command/Output/Result trace format for verification evidence — paraphrase is not evidence; a check without a Command-run block is a skip. - Anti-hedge guard on code-review verdicts: APPROVE_WITH_FIXES is not for "I'm not sure"; findings without traces drop to Medium. PDD packet (inline since prose-only edit, no code): - Purpose: when these skills load, the agent reads its own failure-mode catalogue - Consumer: every slice close (finish-and-verify) and every review (code-review) - Contract: SKILL.md text contains rationalizations + adversarial probe + trace format - Evidence: grep finds ≥3 keyword matches per file; typecheck:extensions exits 0; dist parity - Non-goals: no edits to gate-evaluate.md, dispatching-subagents, ask-gate.ts (deferred)	2026-05-02 03:05:47 +02:00
Mikael Hugo	6ee31e83f4	chore(sf): autonomous sweep — judgment-log/knowledge-compounding/tacit-knowledge tests + PDD v2 research record Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 02:50:44 +02:00

1 2 3 4 5 ...

3818 commits