Add skills system (feature-gap-analysis, code-review, advisory-partner, pm-planning, codebase-analysis, architecture-planning) and fix dispatch_model revert

- Add 6 new skills under src/resources/extensions/sf/skills/ - Revert broken dispatch_model extension from auto-prompts.ts — the subagent tool has no model-override param; skills stay as pure text injection - Fix discuss-headless.md: advisory-partner section now correctly describes that independent review runs via gate-evaluate/validate-milestone (Q3/Q4, MV01-MV04) with the validation model, not inline self-review - Include pm-planning, codebase-analysis, architecture-planning, and feature-gap-analysis skill activations in discuss-headless Active Skills Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 04:51:29 +02:00 · 2026-04-19 04:51:29 +02:00 · 6fc286a888
commit 6fc286a888
parent 9724cb437a
6 changed files with 156 additions and 17 deletions
--- a/src/resources/extensions/sf/auto-prompts.ts
+++ b/src/resources/extensions/sf/auto-prompts.ts
@ -595,8 +595,8 @@ function resolvePreferredSkillNames(
 *  to prevent prompt injection via crafted directory names. */
 const SAFE_SKILL_NAME = /^[a-z0-9][a-z0-9-]*$/;

-function formatSkillActivationBlock(skillNames: string[]): string {
-  const safe = skillNames.filter(name => SAFE_SKILL_NAME.test(name));
+function formatSkillActivationBlock(names: string[]): string {
+  const safe = names.filter(name => SAFE_SKILL_NAME.test(name));
  if (safe.length === 0) return "";
  // Use explicit parameter syntax so LLMs pass { skill: "..." } instead of { name: "..." }.
  // The function-call-like syntax `Skill('name')` led LLMs to infer a positional
@ -658,6 +658,7 @@ export function buildSkillActivationBlock(params: {
  const ordered = [...matched]
    .filter(name => installedNames.has(name) && !avoided.has(name))
    .sort();
+
  return formatSkillActivationBlock(ordered);
 }

--- a/src/resources/extensions/sf/auto-start.ts
+++ b/src/resources/extensions/sf/auto-start.ts
@ -547,8 +547,11 @@ export async function bootstrapAutoSession(
          return releaseLockAndReturn();
        }

-        const { showWorkflowEntry } = await import("./guided-flow.js");
-        await showWorkflowEntry(ctx, pi, base, { step: requestedStepMode });
+        // Auto mode: autonomously map the codebase and create milestones
+        // without waiting for user answers. Uses discuss-headless prompt.
+        ctx.ui.notify("No milestones found. Bootstrapping from codebase analysis.", "info");
+        const { dispatchHeadlessBootstrap } = await import("./guided-flow.js");
+        await dispatchHeadlessBootstrap(ctx, pi, base);

        invalidateAllCaches();
        const postState = await deriveState(base);
@ -573,7 +576,7 @@ export async function bootstrapAutoSession(
            state = postState;
          } else {
            ctx.ui.notify(
-              "Discussion completed but no milestone context was written. Run /sf to try the discussion again, or /sf auto after creating the milestone manually.",
+              "Headless bootstrap completed but no milestone context was written. Retrying.",
              "warning",
            );
            return releaseLockAndReturn();
@ -589,8 +592,9 @@ export async function bootstrapAutoSession(
        const contextFile = resolveMilestoneFile(base, mid, "CONTEXT");
        const hasContext = !!(contextFile && (await loadFile(contextFile)));
        if (!hasContext) {
-          const { showWorkflowEntry } = await import("./guided-flow.js");
-          await showWorkflowEntry(ctx, pi, base, { step: requestedStepMode });
+          ctx.ui.notify(`Milestone ${mid} has no context. Bootstrapping from codebase analysis.`, "info");
+          const { dispatchHeadlessBootstrap } = await import("./guided-flow.js");
+          await dispatchHeadlessBootstrap(ctx, pi, base);

          invalidateAllCaches();
          const postState = await deriveState(base);
--- a/src/resources/extensions/sf/guided-flow.ts
+++ b/src/resources/extensions/sf/guided-flow.ts
@ -608,6 +608,67 @@ export async function showHeadlessMilestoneCreation(
 }


+/**
+ * Headless auto-mode bootstrap: dispatch the discuss-headless unit without
+ * triggering pendingAutoStartMap (caller is already inside bootstrapAutoSession).
+ *
+ * Seeds the prompt by auto-reading README.md / package.json / go.mod from the
+ * project root so the model can map the codebase and plan autonomously.
+ */
+export async function dispatchHeadlessBootstrap(
+  ctx: ExtensionCommandContext,
+  pi: ExtensionAPI,
+  basePath: string,
+): Promise<string> {
+  clearReservedMilestoneIds();
+  bootstrapProject(basePath);
+
+  const existingIds = findMilestoneIds(basePath);
+  const prefs = loadEffectiveSFPreferences();
+  const nextId = nextMilestoneIdReserved(existingIds, prefs?.preferences?.unique_milestone_ids ?? false);
+
+  const milestoneDir = join(sfRoot(basePath), "milestones", nextId, "slices");
+  mkdirSync(milestoneDir, { recursive: true });
+
+  // Build seed context from project root files
+  const seedParts: string[] = [
+    "This is an autonomous headless session. No specification document was provided.",
+    "",
+  ];
+  const rootFiles = ["README.md", "README.rst", "package.json", "go.mod", "Cargo.toml", "pyproject.toml"];
+  for (const fname of rootFiles) {
+    try {
+      const fpath = join(basePath, fname);
+      if (existsSync(fpath)) {
+        const content = readFileSync(fpath, "utf-8").slice(0, 4000);
+        seedParts.push(`### ${fname}\n\n${content}`);
+      }
+    } catch { /* non-fatal */ }
+  }
+  seedParts.push(
+    "",
+    [
+      "Autonomously analyze this codebase to plan what needs to be built or improved.",
+      "",
+      "Investigation approach:",
+      "1. Scout the codebase deeply — use rg, find, ast-grep, and file reads to understand structure, patterns, and tech stack",
+      "2. Run existing tests (go test, cargo test, npm test, etc.) to measure current quality",
+      "3. Web search for industry best practices for this type of software — testing strategies, architecture patterns, operational requirements",
+      "4. Research any libraries, frameworks, or external services involved — get current API docs and constraints",
+      "5. Identify gaps: missing tests, incomplete features, error handling, observability, security, documentation",
+      "",
+      "Goal: define milestones that represent the highest-value work to make this software production-ready, well-tested, and complete.",
+      "Use all available models and research tools. Treat your findings as the specification.",
+    ].join("\n"),
+  );
+  const seedContext = seedParts.join("\n");
+
+  const prompt = buildHeadlessDiscussPrompt(nextId, seedContext, basePath);
+  // Do NOT set pendingAutoStartMap — caller (bootstrapAutoSession) handles the loop
+  await dispatchWorkflow(pi, prompt, "sf-run", ctx, "plan-milestone");
+  return nextId;
+}
+
 // ─── Discuss Flow ─────────────────────────────────────────────────────────────

 /**
--- a/src/resources/extensions/sf/prompts/discuss-headless.md
+++ b/src/resources/extensions/sf/prompts/discuss-headless.md
@ -2,6 +2,40 @@

 You are creating a SF milestone from a provided specification document. This is a **headless** (non-interactive) flow — do NOT ask the user any questions. Wherever the interactive flow would ask the user, make your best-judgment call and document it as an assumption.

+## Active Skills
+
+Apply these skills throughout this session:
+
+### `codebase-analysis`
+Run the full four-phase codebase analysis before planning any milestones: (1) orientation map, (2) ultra-granular critical path analysis, (3) technical debt inventory with priority scores, (4) test coverage gaps. Write `.sf/CODEBASE-ANALYSIS.md` with findings. This is the evidence base for all planning decisions.
+
+### `architecture-planning`
+Map the architecture (C4 Level 1-2) before designing milestones. Identify deep vs shallow modules, coupling problems, boundary violations. Every significant architectural decision made during planning gets an ADR in `docs/adr/`. Update `.sf/DECISIONS.md` via `sf_decision_save` for architectural decisions.
+
+### `pm-planning`
+Apply the `pm-planning` skill throughout this session. Key frameworks to use:
+
+- **Rumelt (Diagnosis → Policies → Actions)**: Diagnose the core challenge first. Don't plan without a diagnosis.
+- **Struggling Moments**: Scan for `TODO/FIXME/HACK`, bare `catch {}`, missing tests, hardcoded values — these are real requirements encoded in code.
+- **Working Backwards**: For each milestone, define what is demonstrably true when it's done before planning implementation.
+- **Opportunity-Solution Tree**: Map problem space (what's broken/missing) before solution space (how to fix it). Never jump to solutions.
+- **JTBD**: Identify functional/emotional jobs — *who* benefits and what anxiety is removed. Low-priority gaps are those that address no real job.
+- **RICE + Cannonballs**: Score opportunities. Prioritize cannonballs (critical path gaps, missing tests on core flows, no observability) over lead bullets (style, minor optimizations).
+- **Build the scooter**: Each milestone must be a complete, demonstrable, functional smaller thing — not a horizontal infrastructure layer with no standalone value.
+- **Appetite-based scoping**: Fix the milestone size budget first, vary scope to fit. Cut aggressively. Never extend scope — split into a second milestone instead.
+- **Anti-shiny-object**: Don't plan new tech, rewrites, or AI features unless there's a diagnosed struggling moment demanding it.
+
+These are the frameworks used by PMs at Stripe, Figma, Duolingo, Airbnb. Apply them at the codebase level.
+
+### `advisory-partner`
+Independent advisory review of plans and decisions runs automatically through the SF gate system — Q3 (abuse surface) and Q4 (broken promises) fire before each slice via `gate-evaluate`, and MV01–MV04 fire after milestone execution via `validate-milestone`. Both run with the `validation` model for genuine second-opinion review. You do not self-apply this skill inline. Consult the `advisory-partner` skill content if you want to understand the review framework being applied by those gates.
+
+### `feature-gap-analysis`
+When a vision or spec is provided, run feature-gap-analysis before planning milestones:
+- Extract a flat feature map from the vision (one capability per row)
+- Scan the codebase for each capability: Implemented / Partial / Missing / Unclear
+- Produce a prioritized gap list (HIGH / MEDIUM / LOW) that feeds directly into milestone selection
+
 ## Provided Specification

 {{seedContext}}
@ -31,19 +65,35 @@ Decide the approach based on the actual scope:

 ## Mandatory Investigation

-Do a mandatory investigation pass before making any decisions. This is not optional.
+Do a mandatory investigation pass before making any decisions. This is not optional. Apply PM thinking throughout.

-1. **Scout the codebase** — `ls`, `find`, `rg`, or `scout` for broad unfamiliar areas. Understand what already exists, what patterns are established, what constraints current code imposes.
-2. **Check library docs** — `resolve_library` / `get_library_docs` for any tech mentioned in the spec. Get current facts about capabilities, constraints, API shapes, version-specific behavior.
-3. **Web search** — `search-the-web` if the domain is unfamiliar, if you need current best practices, or if the spec references external services/APIs you need facts about. Use `fetch_page` for full content when snippets aren't enough.
+### Step 1: Diagnose (Rumelt)
+Before anything else, form a diagnosis: What is the core challenge? What is broken, risky, or missing? Look for:
+- **Struggling moments in code**: `TODO`, `FIXME`, `HACK`, `XXX` comments; bare `catch {}` / `panic` / `log.Fatal`; missing test files; hardcoded magic values; half-implemented flows
+- **Run tests**: `go test ./...`, `cargo test`, `npm test`, `pytest` — failing tests are requirements
+- **Measure coverage**: find untested critical paths
+- **Scan for dead code, stubs, and commented-out features** — abandoned attempts are signals
+- Use `rg`, `find`, `ast-grep`, `ls -la` for broad codebase mapping

-**Web search budget:** Budget carefully across investigation + focused research:
- Prefer `resolve_library` / `get_library_docs` over `search-the-web` for library documentation.
- Prefer `search_and_read` for one-shot topic research.
- Target 2-3 web searches in this investigation pass. Save remaining budget for focused research.
- Do NOT repeat the same or similar queries.
+### Step 2: Check library and ecosystem facts
+- `resolve_library` / `get_library_docs` for any tech in the codebase
+- Understand version constraints, API capabilities, deprecation status
+- This tells you what's achievable without introducing new dependencies

-The goal: your decisions should reflect what's actually true in the codebase and ecosystem, not what you assume.
+### Step 3: Web search for product/domain standards
+- Search for industry best practices for this type of software
+- What does production-grade look like in this domain? (e.g., "Go microservice production readiness", "DR platform testing strategy")
+- What security, observability, reliability expectations apply?
+- Target 2-3 searches; prefer `search_and_read` over multiple queries
+
+### Step 4: Opportunity-Solution Tree
+Map what you found into the OST structure before writing milestones:
+- List 3-5 **opportunities** (problems/gaps, not solutions)
+- For each, list 2-3 **potential solutions**
+- Score by RICE: Reach × Impact × Confidence / Effort
+- Identify cannonballs vs lead bullets
+
+**The goal**: your diagnosis and opportunity map should be grounded in evidence from the actual codebase, not assumptions.

 ## Autonomous Decision-Making

@ -128,6 +178,17 @@ For multi-milestone projects, requirements should span the full vision. Requirem

 **Print the requirements in chat before writing the roadmap.** Print a markdown table with columns: ID, Title, Status, Owner, Source. Group by status (Active, Deferred, Out of Scope).

+## PM Strategy Memory
+
+Before writing milestone artifacts, write or update `.sf/PM-STRATEGY.md` with your analysis so far:
+- **Diagnosis**: the core challenge (Rumelt)
+- **Opportunity Map**: table of top opportunities with RICE scores and cannonball/lead-bullet classification
+- **Jobs Analysis**: whose functional/emotional jobs this product serves
+- **Guiding Policies**: principles governing decisions (e.g. "tests before features")
+- **What Was Deferred**: explicitly out-of-scope items and why
+
+This file is the project's product strategy memory. Future agents read it to understand what's been decided strategically. **Write it even if brief — a short entry is better than none.**
+
 ## Scope Assessment

 Confirm the size estimate from your reflection still holds. Investigation and research often reveal hidden complexity or simplify things. If the scope grew or shrank significantly, adjust the milestone and slice counts accordingly.
--- a/src/resources/extensions/sf/prompts/guided-discuss-milestone.md
+++ b/src/resources/extensions/sf/prompts/guided-discuss-milestone.md
@ -1,5 +1,7 @@
 Discuss milestone {{milestoneId}} ("{{milestoneTitle}}"). Identify gray areas, ask the user about them, and write `{{milestoneId}}-CONTEXT.md` in the milestone directory with the decisions. Use the **Context** output template below. If a `SF Skill Preferences` block is present in system context, use it to decide which skills to load and follow; do not override required artifact rules.

+Apply `pm-planning` skill thinking throughout: use Working Backwards to anchor on what "done" looks like before asking questions; use JTBD to ask whose job this serves; use Opportunity-Solution Tree framing to explore problem space before solution space; use RICE + cannonball/lead-bullet to help prioritize if scope is unclear; use scooter-not-axle to keep milestone scope end-to-end demonstrable.
+
 **Structured questions available: {{structuredQuestionsAvailable}}**

 {{inlinedTemplates}}
--- a/src/resources/extensions/sf/prompts/research-milestone.md
+++ b/src/resources/extensions/sf/prompts/research-milestone.md
@ -10,6 +10,14 @@ All relevant context has been preloaded below — start working immediately with

 {{inlinedContext}}

+## Active Skills
+
+Apply `codebase-analysis` and `architecture-planning` skills during research:
+- If `.sf/CODEBASE-ANALYSIS.md` doesn't exist yet, run the Phase 1 orientation and Phase 3 debt scan for the areas this milestone touches
+- Apply the 5-axis code review (correctness, readability, architecture, security, performance) to relevant code
+- Map the architecture at C4 Level 2 for the components this milestone changes
+- Document any architectural decisions you make as ADRs in `docs/adr/`
+
 ## Your Role in the Pipeline

 You are the first deep look at this milestone. A **roadmap planner** reads your output to decide how to slice the work — what to build first, how to order by risk, what boundaries to draw between slices. Then individual slice researchers and planners dive deeper into each slice. Your research sets the strategic direction for all of them.
@ -44,4 +52,6 @@ Then research the codebase and relevant technologies. Narrate key findings and s

 **You MUST call `sf_summary_save` with the research content before finishing.**

+After saving research, update `.sf/PM-STRATEGY.md` — append new findings to the Opportunity Map and Guiding Policies sections. If the file doesn't exist yet, create it. This is the project's persistent PM memory — research findings that shaped planning decisions belong here.
+
 When done, say: "Milestone {{milestoneId}} researched."