Commit graph

3801 commits

Author SHA1 Message Date
Mikael Hugo
040bdf4eb8 fix(sf): simplify parallel-merge, remove debug logs from state
- Simplify parallel-merge.ts error handling
- Remove console.log debug statements from state.ts deriveState

💘 Generated with Crush

Assisted-by: GLM-5.1 via Crush <crush@charm.land>
2026-05-02 05:48:13 +02:00
Mikael Hugo
37f1028fe9 test: fix mcp-server imports, regex patterns, and add sqlite fallback in parallel-merge 2026-05-02 05:46:32 +02:00
Mikael Hugo
2be52e28a3 test: convert ci_monitor and linux-ready to vitest, add vectordrive to include 2026-05-02 05:45:40 +02:00
Mikael Hugo
449d0ca878 test: convert remaining standalone tests to vitest, remove debug logs, fix parser fallback 2026-05-02 05:43:32 +02:00
Mikael Hugo
ba5ecfc050 fix: stalled-tool-recovery test wrap in describe/it, minor cleanup
- Wrap bare test blocks in describe/it for vitest compatibility
- Clean up vitest.config.ts

💘 Generated with Crush

Assisted-by: GLM-5.1 via Crush <crush@charm.land>
2026-05-02 05:41:39 +02:00
Mikael Hugo
b6358c1c14 test: commit current vitest fixes 2026-05-02 05:39:38 +02:00
Mikael Hugo
0e769dbf13 test: include vitest test import 2026-05-02 05:38:37 +02:00
Mikael Hugo
df03312fa5 test: stabilize vitest compatibility 2026-05-02 05:36:57 +02:00
Mikael Hugo
7dd59ad70d test: enable 7 more converted vitest tests and fix worktree-nested-git slice size 2026-05-02 05:35:42 +02:00
Mikael Hugo
9ad818d4a0 test: enable 7 converted vitest tests previously in exclude list 2026-05-02 05:32:32 +02:00
Mikael Hugo
0682fbc32a test: remove debug logs, fix loop.ts logging, and enable converted vitest tests 2026-05-02 05:13:14 +02:00
Mikael Hugo
3ddb8c84e0 chore: commit current worktree state 2026-05-02 05:11:03 +02:00
Mikael Hugo
e44237e526 test: final vitest API migration fixes across all packages and extensions 2026-05-02 04:49:34 +02:00
Mikael Hugo
5cf94c296e test: complete vitest mock API fixes for callCount and calls access 2026-05-02 04:47:41 +02:00
Mikael Hugo
1de5d5456a chore: complete vitest migration for remaining packages and API calls
- Convert remaining node:test → vitest imports in packages/* and studio/*
- Fix mock.callCount() → mock.callCount property access for vitest compat
- Fix mock.calls[N].arguments → mock.calls[N] for vitest compat
- Update tsconfig.extensions.json to exclude test files from tsc
- Harden migrate-to-vitest-all.mjs regex for single quotes and optional semicolons
2026-05-02 04:46:11 +02:00
Mikael Hugo
b62f7b20ec fix: convert node:test API calls to vitest equivalents
- t.after() → afterEach() with import injection
- t.before() → beforeEach() with import injection
- t.test() → test() (flatten subtests)
- t.skip() → return with skip comment
- Fix vitest.config.ts poolOptions deprecation for Vitest 4
- Run fix-vitest-api.mjs across 108 affected test files

💘 Generated with Crush

Assisted-by: GLM-5.1 via Crush <crush@charm.land>
2026-05-02 04:42:38 +02:00
Mikael Hugo
01d8f2fad6 fix(pi-ai): drop pre-5.3 codex models from generated registry
Remove gpt-5.1 and gpt-5.2 variants from openai-codex-responses.
Keep gpt-5.3+, gpt-5.4, and the newly-added gpt-5.5.
2026-05-02 04:41:06 +02:00
Mikael Hugo
d883f885e9 test(sf): advisor_allowed_providers dispatch gating
- Add behavioural tests for isProviderAllowedForAdvisor wired into
  selectAndApplyModel for subagent unit types.
- Verify non-subagent units are unaffected by the advisor allowlist.
- Add static source analysis guard confirming the check exists.

Assisted-by: Kimi Code CLI
2026-05-02 04:40:08 +02:00
Mikael Hugo
59aaf3dcf3 chore: migrate test suite from node:test to vitest
Add vitest.config.ts with forks pool, v8 coverage, and package aliases.
Run migrate-to-vitest.mjs to replace `from "node:test"` imports with
`from 'vitest'` across 749 test files, converting mock.fn→vi.fn and
mock.timers→vi fake timers where needed.

💘 Generated with Crush

Assisted-by: GLM-5.1 via Crush <crush@charm.land>
2026-05-02 04:37:33 +02:00
Mikael Hugo
a38e72497f fix(sf): reorder guards after dispatch, plan-gate in guards, search provider fixes
- Move guards phase after dispatch in dev path so unitType/unitId are
  available for plan-gate validation
- Relocate UOK plan-gate from runDispatch into runGuards with
  getSliceTaskCounts first-task-of-slice check
- Rename runLegacyAutoLoop → autoLoop in startAuto call sites
- Add plan quality gate in _deriveStateImpl via getSlicePlanBlockingIssue
- Clear path cache in invalidateStateCache
- Deprioritise minimax in search provider fallback ordering
- Fix native-search Anthropic heuristic to exclude copilot/minimax/kimi
  clones while still matching claude-* models
- Add releaseIfIdle to CodexAppServerClient for clean short-lived process
  exit
- Fix nested codex error message parsing
- Update search provider tests to clear minimax env vars
- Add native parser zero-task fallback in parsePlan

💘 Generated with Crush

Assisted-by: GLM-5.1 via Crush <crush@charm.land>
2026-05-02 04:35:26 +02:00
Mikael Hugo
733a3b0f6e feat(pi-ai): codex provider integration and auto-loop rename fix
- Add codex-app-server-client for Codex app server communication
- Update openai-codex-responses provider integration
- Fix auto.ts to use runLegacyAutoLoop post-UOK-refactor
- Add advisor_allowed_providers preference support
- Fix slice plan blocking issue check in auto-recovery
2026-05-02 04:02:10 +02:00
Mikael Hugo
97bbbb58d1 fix(sf): fix test failures — session guard, runLegacyLoop alias, state quality gate
- run-unit.ts: do NOT clear isSessionSwitchInFlight on timeout; let the
  dangling newSession .finally() clear it via generation check. This fixes
  'runUnit keeps the session-switch guard across a late newSession settlement'.
- auto.ts: use `runLegacyLoop: autoLoop` (not runLegacyAutoLoop) — autoLoop
  already defaults to legacy-direct dispatch contract. Fixes source-inspection
  test that expects the literal text 'runLegacyLoop: autoLoop'.
- state.ts: remove over-strict plan quality check from state derivation so
  minimal plans (no review sections) don't block task dispatch.
- auto-recovery.ts, auto-timers.ts: minor cleanup from agent sweep.
- packages/pi-ai: github-copilot.ts OAuth helper + index.ts export wiring.
- openai-codex.ts: drop stale PKCE residuals after simplification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 03:51:12 +02:00
Mikael Hugo
0266ca3ec8 docs(sf): wire parentTrace into advisory-partner dispatch
Adds a Dispatch Pattern subsection showing the parentTrace shape for
advisory review. For advisory, the trace is the planner's reasoning trail
(alternatives considered, untested assumptions, explicit out-of-scope) —
not tool calls. This lets the advisory reviewer catch the gap between
what the planner thought and what the artefact says, which is exactly
what advisory review exists to catch.
2026-05-02 03:45:37 +02:00
Mikael Hugo
2fd0f15c98 docs(sf): wire parentTrace into code-review, requesting-code-review, validate-milestone
Closes the loop on parent-trace pass-through (subagent dispatch wiring +
helper + test were landed earlier). The dispatch tool supports parentTrace
at TaskItem / ChainItem / batch level; until the canonical review skills
teach the LLM to PASS it, the feature is dead code in practice.

- code-review/SKILL.md Phase 2: shows the 5-lens parallel review swarm
  dispatch with parentTrace at the batch level. Reviewer can audit what the
  implementer actually did, not just the prose summary.
- requesting-code-review/SKILL.md Local Review Loop: shows the
  advocate + challenger-A + challenger-B dispatch with parentTrace and
  adds a hard rule that all three must receive it. Specifically calls out
  that the advocate is the most likely to wave away an objection the
  trace contradicts — passing the trace forces engagement.
- prompts/validate-milestone.md Step 1: passes a slice-claim summary
  (one bullet per slice, with SUMMARY path) as parentTrace to the three
  validation reviewers, so they audit slice claims against artifacts.

PDD packet (inline; pure prose docs, no code change):
- Purpose: review skills actually USE the parentTrace plumbing instead of
  dispatching reviewers blind to what the parent did.
- Consumer: code-review (every slice/PR review), requesting-code-review
  (every external review request), validate-milestone (every milestone close).
- Contract: each skill's dispatch example includes parentTrace; the rule
  text instructs the LLM to assemble its own tool-call summary.
- Evidence: grep confirms `parentTrace` in all three files; npm run
  copy-resources propagated to dist; typecheck:extensions exits 0.
- Non-goals: not changing the verifier prompt assembly (already inherits
  from composeTaskWithParentTrace's embedded instructions); not changing
  agent definitions; not auto-capturing the trace (parent agent decides
  what's relevant).
- Invariants: existing dispatch examples preserved with parentTrace added,
  not replacing the original; no agent type changes.
- Assumptions: the parent LLM's context contains the tool-call history it
  needs to assemble parentTrace; the dispatch tool routes the field
  through unchanged (verified by parent-trace.test.ts).
2026-05-02 03:44:02 +02:00
Mikael Hugo
fc1ed49d72 test+docs(sf): parent-trace test + dispatching-subagents skill doc
Follows up the parent-trace dispatch wiring (bundled into bc9cf4fef +
2508822b8). Adds:

- src/resources/extensions/subagent/tests/parent-trace.test.ts — 7 cases
  covering the composeTaskWithParentTrace helper: undefined/empty/whitespace
  pass-through, tag wrapping, task-after-trace ordering, content trimming,
  embedded verifier instructions ("hedge words", "tool errors").
- src/resources/extensions/subagent/index.ts — exports composeTaskWithParentTrace
  so the test can import it.
- skills/dispatching-subagents — new "Parent trace (for verifier/review
  subagents)" subsection documents the field at TaskItem / ChainItem /
  batch level, the per-task override, and the chain (step 0 only) and
  debate (round 1 only) behaviour.

PDD packet (inline; small follow-up to the architectural change):
- Purpose: parent-trace plumbing has a falsifiable test and is documented in
  the canonical dispatching-subagents skill so callers know how to use it.
- Consumer: the dispatching-subagents skill (loaded by every agent that
  calls the subagent tool); the test (covers regression).
- Contract: 7 test cases pass; SKILL.md contains the documented field at
  three schema levels with the override and per-mode behaviour notes.
- Evidence:
  - tests/parent-trace.test.ts → 7/7 pass via the SF resolve-ts loader
  - npm run typecheck:extensions exits 0
  - All 35 subagent suite tests pass
- Non-goals: not changing the dispatch wiring (already in); not adding
  parent-trace handling to background jobs (separate slice if needed).
- Invariants (safety only — sync helper + pure prose docs):
  - composeTaskWithParentTrace returns task unchanged when trace is empty.
  - The original task always appears after the closing tag.
  - Trimmed content is what gets injected, not the raw padded input.
- Assumptions: tests load TS via the resolve-ts.mjs hook (standard SF
  pattern); skills load SKILL.md from dist via copy-resources.
2026-05-02 03:29:56 +02:00
Mikael Hugo
2508822b8f refactor(pi-ai): simplify Codex OAuth + minor fixes across pi-ai and sf
- openai-codex.ts: replace hand-rolled PKCE flow with simple read of
  ~/.codex/auth.json written by the real codex CLI after user authentication.
  Removes ~250 lines of local callback server + browser dance code.
- openai-codex-responses.ts: minor residual cleanup
- openai-completions.ts: drop remaining `as any` stream_options cast
- anthropic-shared.ts: use `unknown` cast on thinkingNoBudget path
- pi-coding-agent/extensions/types.ts: minor type addition
- db-tools.ts: explicit AgentToolResult return type on execute handlers
- requesting-code-review/SKILL.md: prompt wording cleanup
- subagent/index.ts: capability registration wiring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 03:25:39 +02:00
Mikael Hugo
bc9cf4fef3 chore(sf): commit remaining uncommitted improvements
- anthropic-shared.ts: replace `as any` cast on thinkingNoBudget path with
  `as unknown as Record<string, unknown>` for auditability; remove `as any`
  on server_tool_use block (SDK type is now correct)
- openai-completions.ts: drop residual `as any` casts after SDK type update
- db-tools.ts: add explicit AgentToolResult return type annotation on execute
  handlers to resolve implicit-any lint
- requesting-code-review/SKILL.md: update review skill prompt
- subagent/index.ts: wire subagent capability registration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 03:22:52 +02:00
Mikael Hugo
2846c296ee chore(pi-ai): typecheck cleanup, empty-catch comments, OAuth audit notes
- package.json: add 'typecheck' script (build:pi + tsc --noEmit) so pi-ai
  and pi-coding-agent typecheck under the same command surface SF uses.
- anthropic-shared.ts: replace 'as any' casts with proper Anthropic SDK
  types (ServerToolUseBlockParam, WebSearchToolResultBlockParam,
  CacheControlEphemeral). The cache_control variant is documented inline
  so the cast is auditable.
- openai-completions.ts: drop the 'as any' on stream_options — the type
  system can verify the assignment now.
- openai-codex-responses.ts, package-manager.ts, skills.ts: annotate the
  three remaining empty catches with one-line WHY comments (best-effort
  cleanup, malformed ignore files, partial directory traversal). Empty
  catch with no rationale is an SF012 anti-pattern; with rationale it is
  a deliberate fallback.
- oauth/github-copilot.ts, oauth/openai-codex.ts: add UPSTREAM AUDIT
  blocks documenting why these hand-rolled OAuth flows stay hand-rolled
  rather than delegating to @octokit/auth-oauth-device or @openai/codex.
  AbortSignal coverage and provider-specific surface area are the gating
  concerns; re-audit triggers are named.
2026-05-02 03:20:25 +02:00
Mikael Hugo
e6a2ec0a8f fix(sf): guard auto-loop against missing DB and missing basePath
Two small defensive fixes in the auto-loop that surfaced when running
sf in degraded environments (no .sf/sf.db yet, or unset basePath):

- phases.ts: gate planning-flow gate behind isDbAvailable() so a missing or
  not-yet-initialized DB does not throw inside the gate runner.
- run-unit.ts: skip process.chdir when s.basePath is falsy. The original
  guard compared cwd to an empty string, which always failed on the first
  unit of a fresh runtime root.

Both are conservative — preserve existing behaviour when DB and basePath
are present.
2026-05-02 03:20:13 +02:00
Mikael Hugo
082526c0e4 docs(sf): finish PDD v2 propagation into Purpose Gate, requesting/receiving review
Tail-end of the PDD v2 work (Assumptions field + safety/liveness split +
machine-executable Evidence). Three documents that still referenced v1's
4-field Purpose Gate are updated to the full 8-field PDD packet:

- docs/SPEC_FIRST_TDD.md — Purpose Gate now lists all 8 fields with the
  Assumptions and Failure-boundary additions inline.
- skills/requesting-code-review — replaces "Purpose & Consumer" section with
  "PDD packet (all 8 fields)" restated verbatim from .sf/active/{unit-id}/pdd.md.
  Falsifier and Scope-defence sections clarified vs Failure-boundary and
  Non-goals to remove overlap.
- skills/receiving-code-review — Purpose Gate criterion updated to demand
  the full PDD packet with machine-executable Evidence, not just
  Purpose/Consumer/Value-at-risk.

PDD packet (inline):
- Purpose: every artefact that references "Purpose Gate" agrees on the same
  8-field definition; reviewers and reviewees read the same packet.
- Consumer: spec-first-tdd, requesting-code-review, receiving-code-review.
- Contract: all three documents list the same 8 fields with the same
  Assumptions / safety+liveness / machine-executable-Evidence wording.
- Evidence: grep confirms PDD packet references in all three; typecheck:extensions exits 0.
- Non-goals: no edits to the PDD skill itself (already v2); no edits to other
  skills referencing v1 Purpose Gate beyond these three (they don't exist).
- Invariants: existing review-loop sections preserved; only Purpose-Gate-
  related sections rewritten.
- Assumptions: PDD v2 SKILL.md is the canonical source of field definitions;
  these three documents are projections of it.
2026-05-02 03:20:06 +02:00
Mikael Hugo
b48e6d5dd7 docs(sf): verdict discipline, subagent prompt audit, read-only researcher, debugging rationalizations, trace format
Step 2 + scan-and-improve from the Piebald-AI/claude-code-system-prompts pattern
analysis. Five files, prose-only edits, no code changes.

- prompts/gate-evaluate.md — Verdict Discipline section: omitted is not a hedge.
  Each omitted verdict needs a reason; unexplained omitted is treated as
  failed-to-decide and re-dispatched.
- skills/dispatching-subagents — Subagent Prompt Audit: before dispatch, audit
  for smuggled user-questions, action-class delegation, scope creep, and tool
  vs prompt mismatch. After return, scan for hedge words, glossed-over tool
  errors, and self-reports without traces.
- skills/researcher — Read-only discipline block: closes the bash redirect /
  heredoc back-door. Researcher does not write files, DB rows, git, or
  packages; the report is the only output, and write-requires findings are
  surfaced for parent dispatch rather than performed in-skill.
- skills/systematic-debugging — Recognize Your Own Rationalizations: names
  the debugging-specific failure modes ("error message obviously says X",
  "small diff can't be the cause", "test was probably flaky"). Adds Command/
  Output trace format requirement to Phase 4 verification.
- skills/spec-first-tdd — Adds Command/Output trace format requirement to the
  Evidence section.

PDD packet (inline; prose-only edit, all five additions):
- Purpose: harden five SF skills/prompts so loaded text catches rationalizations,
  closes the read-only back-door, and requires falsifiable verdicts/traces.
- Consumer: every gate evaluation, subagent dispatch, research run,
  debugging session, and TDD slice.
- Contract: SKILL/prompt text contains the new sections at predictable
  anchor points, grep-able by the section headings used.
- Evidence: grep-confirmed presence of "Verdict Discipline", "Subagent Prompt
  Audit", "<read_only_discipline>", "Recognize Your Own Rationalizations",
  "Trace format" in their respective files; typecheck:extensions exits 0;
  copy-resources propagated to dist.
- Non-goals: no edits to ask-gate.ts, no transport changes (parent-transcript
  pass-through deferred); no edits to receiving-code-review/requesting-code-
  review (already strong post-PDD-v2).
- Invariants: existing sections preserved; only additions; frontmatter
  unchanged.
- Assumptions: skills loaded from dist via copy-resources; section text is
  injected verbatim into agent context; SF voice (paraphrased patterns, not
  copy-pasted from Anthropic's bytes).
2026-05-02 03:15:35 +02:00
Mikael Hugo
ef325d7b49 docs(sf): self-awareness + adversarial probe + trace format in verify/review skills
Adds three patterns from Piebald-AI/claude-code-system-prompts (extracted from
the public Claude Code npm bundle) to SF's two completion-gate skills:

- "You are bad at this" self-awareness sections at the top of finish-and-verify
  and code-review — names the LLM-specific failure modes (read-don't-run,
  trust-self-reports, hedge-when-uncertain, fooled-by-AI-slop) instead of the
  generic "be thorough" framing.
- Rationalization-callouts that name the exact excuses the agent reaches for
  ("probably fine", "tests already pass", "looks correct based on my reading")
  and invert each with a counter-instruction.
- Mandatory adversarial probe before slice-done / Lens 1 APPROVE: at least one
  boundary / idempotency / concurrency / orphan-reference probe with documented
  result, even when behaviour was correct.
- Command/Output/Result trace format for verification evidence — paraphrase is
  not evidence; a check without a Command-run block is a skip.
- Anti-hedge guard on code-review verdicts: APPROVE_WITH_FIXES is not for "I'm
  not sure"; findings without traces drop to Medium.

PDD packet (inline since prose-only edit, no code):
- Purpose: when these skills load, the agent reads its own failure-mode catalogue
- Consumer: every slice close (finish-and-verify) and every review (code-review)
- Contract: SKILL.md text contains rationalizations + adversarial probe + trace format
- Evidence: grep finds ≥3 keyword matches per file; typecheck:extensions exits 0; dist parity
- Non-goals: no edits to gate-evaluate.md, dispatching-subagents, ask-gate.ts (deferred)
2026-05-02 03:05:47 +02:00
Mikael Hugo
6ee31e83f4 chore(sf): autonomous sweep — judgment-log/knowledge-compounding/tacit-knowledge tests + PDD v2 research record
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 02:50:44 +02:00
Mikael Hugo
7f41e61381 chore(sf): residual edit in bootstrap/register-extension
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 02:46:25 +02:00
Mikael Hugo
7942ba4bda chore(sf): auto-prompts residual sweep
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:45:55 +02:00
Mikael Hugo
a4a9c70c65 chore(sf): residual edits in auto-post-unit + auto-prompts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 02:44:45 +02:00
Mikael Hugo
effada2bb4 chore(sf): judgment-log + auto-post-unit + milestone-framing-check cleanup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:43:28 +02:00
Mikael Hugo
070c0eb802 fix(sf): drive typecheck to 0 errors
Pre-existing errors fixed:
- tools/complete-slice.ts:421 widened error-return-type (field?/reason?)
- workflow-manifest.ts:158-159 parseObjectArray for key_risks/proof_strategy
- workflow-logger.ts LogComponent union additions (memory-embeddings et al.)
- project-research-policy.ts lambda param types (ParsedRequirement element)

Typecheck: 0 errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:43:06 +02:00
Mikael Hugo
4238c033fb chore(sf): final minor cleanup — auto-post-unit + milestone-framing-check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:42:26 +02:00
Mikael Hugo
d1be5d9b74 feat(sf): seed .sf/PRINCIPLES.md, TASTE.md, ANTI-GOALS.md (PDD-anchored)
Tacit knowledge files captured in tracked .sf/ artifacts (per ADR-001):
- PRINCIPLES.md: durable design philosophy, with PDD as the canonical
  change method (purpose / consumer / contract / failure boundary /
  evidence / non-goals / invariants — all 7 fields required)
- TASTE.md: what good code looks like in SF — verbose names, domain >
  layer, behavior-is-the-spec, minimum change, idempotent dispatch,
  fail-non-fatal, structured blocker format, PDD discipline
- ANTI-GOALS.md: 25 rule-coded anti-patterns (SF001-SF025) covering bare
  errors, type lies, magic strings, partial migrations, Ralph-loop retry,
  central federation, MCP between first-party services, implementation-
  mirror tests, coding-before-PDD-fields, happy-path-only, etc.

Translated from ACE-coder's STYLEGUIDE.md as the model. Anchored on
purpose-driven-development as the canonical change method. These three
files plus KNOWLEDGE.md plus DECISIONS.md are the tacit-knowledge layer
auto-injected into every agent context (via system-context.ts mtime cache).

Closes the "smart human gap" identified in this session: the difference
between SF behaving like a competent engineer in this codebase vs. a
generic LLM is the accumulated tacit knowledge available to the agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:41:51 +02:00
Mikael Hugo
8a1f131557 feat(sf): cross-tier escalation policy + ask-gate
Adds explicit Tier 1 / Tier 2 / Tier 3 escalation guidance to every system
prompt. Tier 1 = code lookup (sift, source, .sf/DECISIONS.md). Tier 2 =
external lookup (WebSearch, WebFetch, Context7, MCP servers). Tier 3 = ask
user (in auto/step) or exit-with-structured-blocker (in autonomous).

- bootstrap/system-context.ts: buildEscalationPolicyBlock injected at top
  of SF system-context section, mode-aware via isCanAskUser()
- bootstrap/ask-gate.ts: gateAskUserQuestions() runtime safety net,
  blocks ask_user_questions in autonomous mode at the tool layer with a
  structured rejection that escalates back to Tier 1/2
- tests: 18 escalation-policy + 16 ask-gate, all pass

Implements the user's "solve it like a smart human, not Ralph Wiggum"
philosophy: in autonomous mode the agent must do the research a competent
human would do, and only stop with a blocker when even a human couldn't
proceed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:36:14 +02:00
Mikael Hugo
ec07eca5bd fix(sf): wire schemas/parsers into project-research-policy, trim deep-project-setup stubs
- project-research-policy.ts: replace throw stubs with real imports from
  schemas/parsers.ts — parseProject and parseRequirements now live
- deep-project-setup-policy.ts: remove redundant inline stubs now that
  schemas/validate.ts is ported
- tests/runtime-root-redirect.test.ts: new test for root redirect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:34:08 +02:00
Mikael Hugo
0efc9cd656 docs(sf): final cluster JSDoc — mcp/preferences/native bridges/sf-db/onboarding 2026-05-02 02:32:15 +02:00
Mikael Hugo
f761d31d1c feat(sf): port schemas/parsers+validate, fix project-research-policy stubs + sweeps
- schemas/parsers.ts: new — Markdown→structured object parsers (ParsedProject,
  ParsedRequirements, ParsedRequirement, ParsedRoadmap, parseProject,
  parseRequirements, parseRoadmap, parseRoadmapMilestone)
- schemas/validate.ts: new — artifact validation against parsed schemas
  (validateProject, validateRequirements, validateArtifact)
- project-research-policy.ts: remove throw stubs, wire real parseProject/
  parseRequirements from schemas/parsers — classifyProjectResearchScope now live
- verification-gate.ts: escalation-policy backoff improvements
- workflow-events.ts + workflow-logger.ts: minor type/log additions
- worktree-health.ts: health check timing
- doctor-runtime-checks.ts: expand checks
- tests/escalation-policy.test.ts: new test for gate escalation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:30:57 +02:00
Mikael Hugo
98da1980fb refactor + docs: SF_RUNTIME_PATTERNS canonical + bootstrap/workflow JSDoc
Dead-code removal:
- state.ts: getDeriveTelemetry, resetDeriveTelemetry (zero refs)
- context-budget.ts: reduceToFit (zero refs)
- auto.ts: getActiveRunDir (zero refs)

SF_RUNTIME_PATTERNS canonical extraction (per TODO audit):
- gitignore.ts: exported SF_RUNTIME_PATTERNS
- git-service.ts: RUNTIME_EXCLUSION_PATHS = SF_RUNTIME_PATTERNS (was 27-line mirror)
- worktree-manager.ts: SKIP_PATHS/SKIP_EXACT/SKIP_PREFIXES derived at module load
- doctor-runtime-checks.ts: criticalPatterns = SF_RUNTIME_PATTERNS
- Cross-file sync obligation now compile-time enforced

Bootstrap + workflow JSDoc sweep: 189 blocks across 17 files.

Typecheck clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:29:46 +02:00
Mikael Hugo
b8bcd6fdd1 feat(sf): port deep-project-setup-policy + UOK audit event types + sweeps
- deep-project-setup-policy.ts: new — DeepProjectSetupState, getDeepProjectSetupState,
  getNextDeepProjectSetupStage, researchDecisionPath, writeDefaultResearchSkipDecision
- uok/audit.ts: add missing audit event types to match gsd2 (model-policy-block,
  gate-timeout, gate-input-fail, dispatch-blocked)
- hook-emitter.ts: proper emitExtensionEvent wiring with SF's ExtensionAPI
- bootstrap/system-context.ts: deep-project-setup context block injection
- doctor-types.ts + doctor-runtime-checks.ts: expand runtime check types
- milestone-id-reservation.ts: align ghost-milestone reuse logic
- tests/detection.test.ts: fix stale import path
- worktree-resolver.ts: path normalization edge case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:29:16 +02:00
Mikael Hugo
360208cbaf feat(sf): port commands-memory, component-loader, workflow-oneshot prompt + sweeps
- commands-memory.ts: /sf memory command handlers (add/list/search/delete)
- component-loader.ts: component lifecycle management and validation
- prompts/workflow-oneshot.md: oneshot workflow execution prompt template
- session-forensics.ts, definition-io.ts, sf-db.ts, commands-scaffold-sync,
  worktree-resolver: secondary sweep improvements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:27:42 +02:00
Mikael Hugo
3a3ea29c51 chore(sf): test backfill, parse helpers, parallel session pickups 2026-05-02 02:26:01 +02:00
Mikael Hugo
192fd3e180 feat(sf): port python-resolver, state-transition-matrix
- python-resolver.ts: new — resolves python/python3 executable path
- state-transition-matrix.ts: new — valid auto-mode state machine transitions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:22:47 +02:00
Mikael Hugo
dda9793cd6 feat(sf): port sf-home, memory-embeddings, component-types, workflow-install + sweep
- sf-home.ts: new — resolves ~/.sf/ path and SF home dir helpers (port of gsd-home.ts)
- memory-embeddings.ts: new — embedding helpers for memory similarity search
- component-types.ts: new — Component, ComponentManifest, ComponentHook type defs
- workflow-install.ts: new — workflow installation from local/remote sources
- auto-post-unit.ts: clearEvidenceFromDisk after successful verification
- routing-history.ts: add cost-per-token tracking to routing decisions
- workflow-{manifest,templates}.ts: hardening sweep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 02:22:13 +02:00