Two small defensive fixes in the auto-loop that surfaced when running
sf in degraded environments (no .sf/sf.db yet, or unset basePath):
- phases.ts: gate planning-flow gate behind isDbAvailable() so a missing or
not-yet-initialized DB does not throw inside the gate runner.
- run-unit.ts: skip process.chdir when s.basePath is falsy. The original
guard compared cwd to an empty string, which always failed on the first
unit of a fresh runtime root.
Both are conservative — preserve existing behaviour when DB and basePath
are present.
Tail-end of the PDD v2 work (Assumptions field + safety/liveness split +
machine-executable Evidence). Three documents that still referenced v1's
4-field Purpose Gate are updated to the full 8-field PDD packet:
- docs/SPEC_FIRST_TDD.md — Purpose Gate now lists all 8 fields with the
Assumptions and Failure-boundary additions inline.
- skills/requesting-code-review — replaces "Purpose & Consumer" section with
"PDD packet (all 8 fields)" restated verbatim from .sf/active/{unit-id}/pdd.md.
Falsifier and Scope-defence sections clarified vs Failure-boundary and
Non-goals to remove overlap.
- skills/receiving-code-review — Purpose Gate criterion updated to demand
the full PDD packet with machine-executable Evidence, not just
Purpose/Consumer/Value-at-risk.
PDD packet (inline):
- Purpose: every artefact that references "Purpose Gate" agrees on the same
8-field definition; reviewers and reviewees read the same packet.
- Consumer: spec-first-tdd, requesting-code-review, receiving-code-review.
- Contract: all three documents list the same 8 fields with the same
Assumptions / safety+liveness / machine-executable-Evidence wording.
- Evidence: grep confirms PDD packet references in all three; typecheck:extensions exits 0.
- Non-goals: no edits to the PDD skill itself (already v2); no edits to other
skills referencing v1 Purpose Gate beyond these three (they don't exist).
- Invariants: existing review-loop sections preserved; only Purpose-Gate-
related sections rewritten.
- Assumptions: PDD v2 SKILL.md is the canonical source of field definitions;
these three documents are projections of it.
Step 2 + scan-and-improve from the Piebald-AI/claude-code-system-prompts pattern
analysis. Five files, prose-only edits, no code changes.
- prompts/gate-evaluate.md — Verdict Discipline section: omitted is not a hedge.
Each omitted verdict needs a reason; unexplained omitted is treated as
failed-to-decide and re-dispatched.
- skills/dispatching-subagents — Subagent Prompt Audit: before dispatch, audit
for smuggled user-questions, action-class delegation, scope creep, and tool
vs prompt mismatch. After return, scan for hedge words, glossed-over tool
errors, and self-reports without traces.
- skills/researcher — Read-only discipline block: closes the bash redirect /
heredoc back-door. Researcher does not write files, DB rows, git, or
packages; the report is the only output, and write-requires findings are
surfaced for parent dispatch rather than performed in-skill.
- skills/systematic-debugging — Recognize Your Own Rationalizations: names
the debugging-specific failure modes ("error message obviously says X",
"small diff can't be the cause", "test was probably flaky"). Adds Command/
Output trace format requirement to Phase 4 verification.
- skills/spec-first-tdd — Adds Command/Output trace format requirement to the
Evidence section.
PDD packet (inline; prose-only edit, all five additions):
- Purpose: harden five SF skills/prompts so loaded text catches rationalizations,
closes the read-only back-door, and requires falsifiable verdicts/traces.
- Consumer: every gate evaluation, subagent dispatch, research run,
debugging session, and TDD slice.
- Contract: SKILL/prompt text contains the new sections at predictable
anchor points, grep-able by the section headings used.
- Evidence: grep-confirmed presence of "Verdict Discipline", "Subagent Prompt
Audit", "<read_only_discipline>", "Recognize Your Own Rationalizations",
"Trace format" in their respective files; typecheck:extensions exits 0;
copy-resources propagated to dist.
- Non-goals: no edits to ask-gate.ts, no transport changes (parent-transcript
pass-through deferred); no edits to receiving-code-review/requesting-code-
review (already strong post-PDD-v2).
- Invariants: existing sections preserved; only additions; frontmatter
unchanged.
- Assumptions: skills loaded from dist via copy-resources; section text is
injected verbatim into agent context; SF voice (paraphrased patterns, not
copy-pasted from Anthropic's bytes).
Adds three patterns from Piebald-AI/claude-code-system-prompts (extracted from
the public Claude Code npm bundle) to SF's two completion-gate skills:
- "You are bad at this" self-awareness sections at the top of finish-and-verify
and code-review — names the LLM-specific failure modes (read-don't-run,
trust-self-reports, hedge-when-uncertain, fooled-by-AI-slop) instead of the
generic "be thorough" framing.
- Rationalization-callouts that name the exact excuses the agent reaches for
("probably fine", "tests already pass", "looks correct based on my reading")
and invert each with a counter-instruction.
- Mandatory adversarial probe before slice-done / Lens 1 APPROVE: at least one
boundary / idempotency / concurrency / orphan-reference probe with documented
result, even when behaviour was correct.
- Command/Output/Result trace format for verification evidence — paraphrase is
not evidence; a check without a Command-run block is a skip.
- Anti-hedge guard on code-review verdicts: APPROVE_WITH_FIXES is not for "I'm
not sure"; findings without traces drop to Medium.
PDD packet (inline since prose-only edit, no code):
- Purpose: when these skills load, the agent reads its own failure-mode catalogue
- Consumer: every slice close (finish-and-verify) and every review (code-review)
- Contract: SKILL.md text contains rationalizations + adversarial probe + trace format
- Evidence: grep finds ≥3 keyword matches per file; typecheck:extensions exits 0; dist parity
- Non-goals: no edits to gate-evaluate.md, dispatching-subagents, ask-gate.ts (deferred)
Tacit knowledge files captured in tracked .sf/ artifacts (per ADR-001):
- PRINCIPLES.md: durable design philosophy, with PDD as the canonical
change method (purpose / consumer / contract / failure boundary /
evidence / non-goals / invariants — all 7 fields required)
- TASTE.md: what good code looks like in SF — verbose names, domain >
layer, behavior-is-the-spec, minimum change, idempotent dispatch,
fail-non-fatal, structured blocker format, PDD discipline
- ANTI-GOALS.md: 25 rule-coded anti-patterns (SF001-SF025) covering bare
errors, type lies, magic strings, partial migrations, Ralph-loop retry,
central federation, MCP between first-party services, implementation-
mirror tests, coding-before-PDD-fields, happy-path-only, etc.
Translated from ACE-coder's STYLEGUIDE.md as the model. Anchored on
purpose-driven-development as the canonical change method. These three
files plus KNOWLEDGE.md plus DECISIONS.md are the tacit-knowledge layer
auto-injected into every agent context (via system-context.ts mtime cache).
Closes the "smart human gap" identified in this session: the difference
between SF behaving like a competent engineer in this codebase vs. a
generic LLM is the accumulated tacit knowledge available to the agent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds explicit Tier 1 / Tier 2 / Tier 3 escalation guidance to every system
prompt. Tier 1 = code lookup (sift, source, .sf/DECISIONS.md). Tier 2 =
external lookup (WebSearch, WebFetch, Context7, MCP servers). Tier 3 = ask
user (in auto/step) or exit-with-structured-blocker (in autonomous).
- bootstrap/system-context.ts: buildEscalationPolicyBlock injected at top
of SF system-context section, mode-aware via isCanAskUser()
- bootstrap/ask-gate.ts: gateAskUserQuestions() runtime safety net,
blocks ask_user_questions in autonomous mode at the tool layer with a
structured rejection that escalates back to Tier 1/2
- tests: 18 escalation-policy + 16 ask-gate, all pass
Implements the user's "solve it like a smart human, not Ralph Wiggum"
philosophy: in autonomous mode the agent must do the research a competent
human would do, and only stop with a blocker when even a human couldn't
proceed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- project-research-policy.ts: replace throw stubs with real imports from
schemas/parsers.ts — parseProject and parseRequirements now live
- deep-project-setup-policy.ts: remove redundant inline stubs now that
schemas/validate.ts is ported
- tests/runtime-root-redirect.test.ts: new test for root redirect
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- sf-home.ts: new — resolves ~/.sf/ path and SF home dir helpers (port of gsd-home.ts)
- memory-embeddings.ts: new — embedding helpers for memory similarity search
- component-types.ts: new — Component, ComponentManifest, ComponentHook type defs
- workflow-install.ts: new — workflow installation from local/remote sources
- auto-post-unit.ts: clearEvidenceFromDisk after successful verification
- routing-history.ts: add cost-per-token tracking to routing decisions
- workflow-{manifest,templates}.ts: hardening sweep
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Performance fix from audit:
- bootstrap/system-context.ts: cachedReadFile() with mtime-keyed in-process
cache for KNOWLEDGE.md (global + project) and ARCHITECTURE.md. Eliminates
3-4 sync readFileSync calls per agent turn on the common case where these
files haven't changed. Live edits still picked up via mtime invalidation.
Docstring sweep on the notification + detection cluster:
- headless-events.ts: 17 JSDoc blocks (exit codes + every classification fn)
- notification-store.ts, notification-overlay.ts, notification-widget.ts,
notifications.ts: ~17 blocks
- detection.ts, codebase-generator.ts: ~5 blocks
Typecheck clean. 3/3 perf tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Last batch from the parallel swarm session: docstring tweaks,
verification-gate doc additions, workflow-reconcile and worktree-command
follow-ups, doctor-environment cleanup. Typecheck clean.
Most of the session work landed in earlier commits (8be8f4774, 3045538cb,
038938f2a, ed85252fc, 4f4b584e5, etc.); this commit is the residual
working-tree state after all swarms reported.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Touches auto.ts, auto/loop.ts, preferences.ts, safety/git-checkpoint.ts,
token-counter.ts, tools/complete-slice.ts, verification-gate.ts,
workflow-logger.ts, workflow-migration.ts, plus new
tests/record-promoter.test.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- src/headless-events.ts: add case "reload" → EXIT_RELOAD (12).
EXIT_RELOAD sentinel was defined but unused — "reload" status fell
through to EXIT_ERROR (1).
- src/resources/extensions/sf/notification-store.ts:109: use <= for
dedup window so a second identical notification at exactly
DEDUP_WINDOW_MS still gets suppressed (was off-by-one at boundary).
- src/resources/extensions/sf/definition-loader.ts: pending docstring
tweaks from autonomous sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- new worktree-root.ts / worktree-session-state.ts: track and restore
original project root after /worktree merge or /worktree return
- new tools/skip-slice.ts: cascade skip to tasks in the slice so milestone
completion isn't blocked by pending tasks (#4375)
- auto/run-unit.ts: anchor cwd to basePath before newSession() captures it
(GAP-10) — prevents tool runtime / system prompt from rooting on drifted
cwd from async_bash, background jobs, or prior unit cleanup
- safety/git-checkpoint.ts: harden HEAD-rev-parse against execFileSync
errors, surface stderr properly
- broad JSDoc / docstring pass across the rest of the SF extension surface
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace ~700 LOC of hand-rolled OAuth and onboarding with cli-core's own
getOauthClient + setupUser. The provider now reads ~/.gemini/oauth_creds.json
itself (via cli-core), refreshes tokens, and discovers the Code Assist
project + tier server-side — exactly like the real gemini CLI does.
- provider/google-gemini-cli.ts: drop apiKey={token,projectId} JSON
plumbing; getCodeAssistServer() uses cli-core for everything
- delete utils/oauth/google-gemini-cli.ts (457 LOC: hand-rolled login,
PKCE, callback server, discoverProject, onboardUser, tier handling)
- delete utils/oauth/google-oauth-utils.ts (201 LOC: only consumed by
the deleted gemini-cli helper)
- oauth/index.ts: remove gemini-cli from BUILT_IN_OAUTH_PROVIDERS
registry; google-gemini-cli is no longer SF-managed
- auth-storage.ts: update 3 error messages to direct users to the real
gemini CLI for authentication instead of the removed /login command
Login UX: users authenticate with the real gemini CLI; we just consume
~/.gemini/oauth_creds.json. Whole-provider disable goes through manual
settings.json edit (per-model toggle still works in interactive UI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`/sf autonomous full` (or `--full`) plumbs through to AutoSession.fullAutonomy,
to be consumed at milestone-complete to skip the human-review pause and
auto-merge + chain to the next milestone. Git revert is the safety net
(see ADR-019/021 conversation on autonomy and reversibility).
Plumbing path:
- commands/handlers/auto.ts: parses `full` / `--full` modifier, threads
fullAutonomy through launchAuto options
- commands/catalog.ts: completion entries for `full` and `--full`
- auto.ts: startAuto and startAutoDetached accept fullAutonomy in options;
startAuto pins it on the session up-front so resume paths preserve it
- auto/session.ts: AutoSession.fullAutonomy field with full docstring
Behavior change is staged: the milestone-complete consumer that auto-merges
and chains is intentionally not in this commit (parallel session is active
in auto-post-unit.ts and auto/loop.ts; will land in a follow-up).
Also adds JSDoc to the functions on the touched path:
- handleAutoCommand (full command-family doc)
- launchAuto (headless vs detached routing)
- startAutoDetached (fire-and-forget rationale, why it diverges from startAuto)
- AutoSession.fullAutonomy (full inline doc)
Typecheck clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loop.ts:
- saveStuckState on main dev path (was only on custom-engine path — P1 fix)
- Add pid to stuck-state JSON to prevent test pollution across process runs
- Use atomicWriteSync in saveCustomVerifyRetryCounts for crash-safety
- Add enforceMinRequestInterval + call before both runUnitPhaseViaContract sites
- Update s.lastRequestTimestamp from requestDispatchedAt on each unit
session.ts:
- Add lastRequestTimestamp and lastUnitAgentEndMessages fields
phases.ts:
- Add consecutiveSessionTimeouts + exponential-backoff auto-resume (up to 3x)
for session-creation timeouts before pausing for manual review
- Add loadEvidenceFromDisk after resetEvidence to rehydrate evidence on restart
- Add USER_DRIVEN_DEEP_UNITS + isAwaitingUserInput guard to skip artifact
verification when a deep-planning unit is paused awaiting user input
- Store s.lastUnitAgentEndMessages after each unit run
- Add requestDispatchedAt to runUnitPhase return type
evidence-collector.ts: add loadEvidenceFromDisk export
auto-post-unit.ts: add USER_DRIVEN_DEEP_UNITS set + re-export isAwaitingUserInput
user-input-boundary.ts: port from gsd2 (isAwaitingUserInput + approval helpers)
run-unit.ts: capture requestDispatchedAt at API dispatch time
kernel.ts: remove redundant !legacyFallback guard (enabled already encodes it)
tests/uok-kernel-path.test.ts: add SF_UOK_AUDIT_ENVELOPE env var assertions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test was checking for a literal single-line ternary in auto-post-unit.ts,
but the formatter naturally renders the same ternary multi-line. The semantic
content is identical; the test was failing on whitespace alone.
Normalize runs of whitespace before substring-matching so the assertion
survives prettier/biome formatting changes.
After this fix: 39/39 uok tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- loop.ts: add DispatchContract type, AutoLoopOptions, resolveDispatchNodeKind,
runUnitPhaseViaContract — kernel path routes unit execution through
ExecutionGraphScheduler; legacy path passes through directly
- loop.ts: export runUokKernelLoop (contract=uok-scheduler) and
runLegacyAutoLoop (contract=legacy-direct)
- auto-loop.ts: re-export both new loop functions
- auto.ts: use runUokKernelLoop/runLegacyAutoLoop at both call sites
- phases.ts: use uokFlags.planningFlow for plan gate (was bypassing
legacyFallback via raw pref read)
- auto-dispatch.ts: use hasFinalizedMilestoneContext for execution-entry
context check (picks up SF_PROJECT_ROOT artifact fallback)
- tests: port uok-writer, uok-parity-report, uok-loop-adapter-writer,
uok-kernel-path test files from gsd2 — all 8 tests pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up to commit 39e2dc70c. Two small improvements that surfaced when
the parallel Phase D subagent finished and inspected the worktree:
- commands-scaffold-sync.ts:
- Tighten ScaffoldKeeperFn to match Phase D's actual dispatcher signature
(basePath, ctx) => Promise<number>. Define a local minimal
ScaffoldKeeperCtxShape for the lazy loader so we don't form a hard
import dependency on scaffold-keeper.ts.
- Remove duplicated "Upgradable" line from the report table — keep only
"Pending" since ADR-021 §10 names that the user-facing label.
- tests/scaffold-keeper.test.ts: better-typed notify stub; covers Phase E
arg-parser helpers (parseScaffoldSyncArgs, matchesOnly, applyOnlyFilter).
Typecheck clean. 49/49 scaffold tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase D: scaffold-keeper background agent
- scaffold-keeper.ts: dispatchScaffoldKeeperIfNeeded fires async after milestone
completion and on stopAuto cleanup. Detects editing-drift items, writes
<file>.proposed artifacts (template-only stub for now; later wires the
records-keeper skill subagent for code-as-fact merging), emits a structured
approval_request notification with stable dedupe_key so repeated runs don't
spam the user.
- Wired into auto-post-unit.ts and auto.ts:stopAuto via fire-and-forget so
the auto loop is never blocked by scaffold work.
- Failure modes non-fatal: try/catch around the dispatch, errors logged via
logWarning("scaffold").
Phase E: /sf scaffold sync command (escape hatch)
- commands-scaffold-sync.ts: parseScaffoldSyncArgs + handleScaffoldSync.
- Flags:
--dry-run report what would change, no writes
--include-editing run scaffold-keeper synchronously for editing-drift items
--only=<glob> scope to a path glob (suffix/prefix match)
- Wired into the SF command system via commands-bootstrap.ts, commands/catalog.ts,
and commands/handlers/ops.ts following the existing /sf <verb> pattern.
- Reuses ensureAgenticDocsScaffold from Phase C — doesn't reimplement sync logic.
Doctor finding (checkScaffoldFreshness) refined to reference the new command.
Tests: 8 new cases in scaffold-keeper.test.ts. All 49 scaffold tests green.
Together with Phases A-C, this completes ADR-021. Documents are now versioned,
upgrades are automatic for the safe cases, and editing-drift surfaces through
.proposed artifacts and structured notifications. The scaffold-keeper agent
body is currently a template-only stub; replacing it with a real records-keeper
subagent dispatch is a follow-up that the architecture now enables.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase C (automatic silent sync) had no dedicated tests when committed.
Added 8 cases covering:
- ensureAgenticDocsScaffold on empty dir creates files with markers
- old-version pending marker silently re-renders to current
- editing-drift file left untouched
- legacy unmarked file matched against archive promoted to pending
- migrateLegacyScaffold idempotency
Total scaffold test count: 41 (was 33).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>