diff --git a/BUILD_PLAN.md b/BUILD_PLAN.md index b8ac62acb..9cbed7644 100644 --- a/BUILD_PLAN.md +++ b/BUILD_PLAN.md @@ -57,7 +57,7 @@ Process for each: | 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` | | 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/` → `.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) | | 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) | -| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` | +| 6 | **MCP server stdout-buffer deadlock** | Not applicable — SF no longer ships an MCP server package. Do not port unless a future accepted ADR reintroduces an SF-owned MCP server. | N/A | | 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` | | 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` | | 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` | @@ -235,7 +235,7 @@ Spec sections that landed during late-stage adversarial review and only matter a | Item | Spec | Why deferred | |---|---|---| | SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. | -| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. The MCP server (`packages/mcp-server`) is the more likely remote interface. | +| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. SF currently supports MCP as a client surface only, not as an SF workflow server. | | `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. | | PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. | | Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. | diff --git a/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md b/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md index 134e92e66..aa50a5b56 100644 --- a/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md +++ b/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md @@ -1,339 +1,17 @@ # ADR-008 Implementation Plan **Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md) -**Status:** Superseded — do not implement +**Status:** Rejected — never implement **Date:** 2026-04-09 -> Superseded by the current boundary in ADR-020: SF workflow tools are not exposed -> as an MCP server for external clients. SF may use MCP clients for external tools, -> but SF itself runs directly. +## Superseded Boundary -## Objective +Never build or restore an SF MCP server package. -Implement the ADR-008 decision by exposing the core SF workflow tool contract over MCP, then wiring MCP-backed access into provider paths that cannot use the native in-process SF tool registry directly. +Current SF product boundary: -The first usable outcome is: +- SF uses MCP only as a **client** for external tool servers configured through `.mcp.json` or `.sf/mcp.json`. +- SF workflow execution runs through native SF tools, headless/RPC, and the daemon/RPC client path. +- SF never exposes its workflow as an MCP server for Claude Code, Cursor, or other external clients. -- a Claude Code-backed execution session can complete a task using canonical SF tools -- no manual summary-writing fallback is needed -- native provider behavior remains unchanged - -## Non-Goals - -- Replacing native in-process SF tools with MCP -- Exporting every historical alias in the first rollout -- Reworking the entire session-oriented MCP server before proving the workflow-tool surface -- Supporting every provider path before Claude Code is working end-to-end - -## Constraints - -- Native and MCP tool paths must share business logic -- MCP must not bypass write-gate or discussion-gate protections -- Canonical SF state transitions must remain DB-backed -- Provider capability mismatches must fail early, not degrade silently - -## Workstreams - -### 1. Shared Handler Extraction - -Goal: separate business logic from transport registration. - -Targets: - -- `src/resources/extensions/sf/bootstrap/db-tools.ts` -- `src/resources/extensions/sf/bootstrap/query-tools.ts` -- `src/resources/extensions/sf/tools/complete-task.ts` -- sibling modules used by planning/summary/validation tools - -Deliverables: - -- transport-neutral handler entrypoints for the minimum workflow tool set -- thin native registration wrappers that call those handlers -- thin MCP registration wrappers that call those handlers - -Exit criteria: - -- native tool behavior is unchanged -- no workflow tool logic is duplicated in MCP server code - -### 2. Workflow-Tool MCP Surface - -Goal: add an MCP server surface for real SF workflow tools, distinct from the current session/read API. - -Preferred first-cut tool set: - -- `sf_summary_save` -- `sf_decision_save` -- `sf_plan_milestone` -- `sf_plan_slice` -- `sf_plan_task` -- `sf_task_complete` -- `sf_slice_complete` -- `sf_complete_milestone` -- `sf_validate_milestone` -- `sf_replan_slice` -- `sf_reassess_roadmap` -- `sf_save_gate_result` -- `sf_milestone_status` - -Likely files: - -- `packages/mcp-server/src/server.ts` or a new sibling server package -- `packages/mcp-server/src/...` supporting modules -- shared tool-definition metadata if needed - -Decisions to make during implementation: - -- extend existing MCP package vs create `packages/mcp-sf-tools-server` -- canonical names only vs selected alias export -- single combined server vs separate “session” and “workflow” server modes - -Exit criteria: - -- MCP tool discovery shows the minimum tool set -- each MCP tool invokes the shared handlers successfully in isolation - -### 3. Safety and Policy Parity - -Goal: ensure MCP mutations enforce the same rules as native tool calls. - -Targets: - -- `src/resources/extensions/sf/bootstrap/write-gate.ts` -- any current tool-call gating hooks tied to native runtime only -- MCP wrapper layer before shared handler invocation - -Required protections: - -- discussion gate blocking -- queue-mode restrictions -- write-path restrictions -- canonical DB/file rendering order - -Exit criteria: - -- MCP cannot be used to bypass native write restrictions -- blocked native scenarios remain blocked over MCP - -### 4. Claude Code Provider Integration - -Goal: attach the SF workflow-tool MCP surface to Claude Code sessions. - -Targets: - -- `src/resources/extensions/claude-code-cli/stream-adapter.ts` -- `src/resources/extensions/claude-code-cli/index.ts` - -Expected work: - -- build a SF-managed `mcpServers` config for the Claude SDK session -- attach the workflow MCP server only when the session requires SF tools -- keep current Claude Code streaming behavior intact - -Exit criteria: - -- Claude Code session can discover the SF workflow MCP tools -- task execution path can call `sf_task_complete` successfully - -### 5. Capability Detection and Failure Path - -Goal: refuse to start tool-dependent workflows when required capabilities are unavailable. - -Targets: - -- SF dispatch / auto-mode preflight -- provider selection and routing checks -- user-facing compatibility errors - -Required behavior: - -- if native SF tools are available, proceed -- else if SF workflow MCP tools are available, proceed -- else fail fast with a precise message - -Exit criteria: - -- no execution prompt is sent that requires unavailable tools -- users with only unsupported capability combinations get a hard error, not a fake fallback - -### 6. Prompt and Documentation Alignment - -Goal: keep the workflow contract strict while removing transport assumptions from docs and runtime messaging. - -Targets: - -- `src/resources/extensions/sf/prompts/execute-task.md` -- related planning/discuss prompts that reference tool availability -- provider and MCP docs - -Rules: - -- prompts should keep requiring canonical SF completion/planning tools -- prompts should not imply “native in-process tool only” -- docs should explain native vs MCP-backed fulfillment paths - -Exit criteria: - -- prompt contract matches runtime reality -- no provider is told to use a tool surface it cannot access - -## Phase Plan - -## Phase 1: Spike and Handler Extraction - -Scope: - -- extract shared logic for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status` -- prove native wrappers still work - -Why first: - -- these tools are enough to test end-to-end completion semantics without migrating the full catalog - -Verification: - -- existing native tests still pass -- new unit tests cover shared handler entrypoints directly - -## Phase 2: Minimal Workflow MCP Server - -Scope: - -- expose the three extracted tools over MCP -- ensure discovery schemas are clean and canonical - -Verification: - -- MCP discovery returns all three tools -- direct MCP calls succeed against a fixture project - -## Phase 3: Claude Code End-to-End Proof - -Scope: - -- wire the minimal workflow MCP server into the Claude SDK session -- run a single execution path that ends with task completion - -Verification: - -- Claude Code can call `sf_task_complete` -- summary file, DB state, and plan checkbox update correctly - -## Phase 4: Expand to Full Minimum Workflow Set - -Scope: - -- add planning, slice completion, milestone completion, roadmap reassessment, and gate result tools - -Verification: - -- discuss/plan/execute/complete lifecycle works over MCP for the supported flow set - -## Phase 5: Capability Gating and UX Hardening - -Scope: - -- add preflight capability checks -- add clear error messaging for unsupported setups - -Verification: - -- unsupported provider/session combinations fail before execution starts - -## Phase 6: Prompt and Doc Cleanup - -Scope: - -- align prompts and docs with the new transport-neutral contract - -Verification: - -- prompt references are accurate -- docs describe the supported architecture and limitations - -## File-Level Starting Map - -High-probability files for the first implementation: - -- `src/resources/extensions/sf/bootstrap/db-tools.ts` -- `src/resources/extensions/sf/bootstrap/query-tools.ts` -- `src/resources/extensions/sf/bootstrap/write-gate.ts` -- `src/resources/extensions/sf/tools/complete-task.ts` -- `src/resources/extensions/claude-code-cli/stream-adapter.ts` -- `src/resources/extensions/claude-code-cli/index.ts` -- `packages/mcp-server/src/server.ts` -- `packages/mcp-server/src/session-manager.ts` -- `packages/mcp-server/README.md` -- `src/resources/extensions/sf/prompts/execute-task.md` - -## Testing Strategy - -### Unit - -- shared handlers -- MCP wrapper adapters -- gating / capability-check helpers - -### Integration - -- direct MCP tool invocation against fixture projects -- native tool invocation regression coverage -- Claude Code provider path with MCP attached - -### End-to-End - -- plan or execute a small fixture task and complete it through canonical SF tools -- confirm DB row, rendered summary, and plan state stay in sync - -## Risks - -### Risk 1: Logic Drift - -If native and MCP wrappers each evolve their own behavior, parity will collapse quickly. - -Mitigation: - -- shared handler extraction before broad MCP exposure - -### Risk 2: Safety Regression - -If MCP becomes a side door around native gating, the architecture is worse than before. - -Mitigation: - -- centralize or reuse gating checks before shared handler invocation - -### Risk 3: Overly Broad First Rollout - -Exporting every tool and alias immediately increases scope and test burden. - -Mitigation: - -- ship a minimal workflow tool set first - -### Risk 4: Claude SDK Session Wiring Complexity - -Attaching MCP servers dynamically may expose edge cases around cwd, permissions, or subprocess lifecycle. - -Mitigation: - -- prove a narrow spike with 2-3 tools before expanding - -## Exit Criteria for ADR-008 - -ADR-008 is considered implemented when: - -1. Claude Code-backed execution can use canonical SF workflow tools over MCP. -2. Native provider behavior remains intact. -3. Shared handlers back both native and MCP invocation. -4. Gating and state integrity protections apply equally to MCP mutations. -5. Capability checks prevent prompts from requiring unavailable tools. - -## Recommended Next Task - -Start with a narrow spike: - -1. Extract shared handlers for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`. -2. Expose those tools through a minimal workflow MCP server. -3. Attach that MCP server to Claude Code sessions. -4. Prove end-to-end task completion on a fixture project. +This document is kept only as a tombstone for the rejected implementation plan. Do not revive this direction. diff --git a/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md b/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md index de7eb9360..4c946fcf3 100644 --- a/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md +++ b/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md @@ -1,246 +1,25 @@ -# ADR-008: Expose SF Workflow Tools Over MCP for Provider Parity +# ADR-008: SF Tools Over MCP for Provider Parity -**Status:** Superseded — historical record +**Status:** Rejected — never implement **Date:** 2026-04-09 **Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md` -**Deciders:** Jeremy McSpadden -**Related:** ADR-004 (capability-aware model routing), ADR-007 (model catalog split and provider API encapsulation), `src/resources/extensions/sf/bootstrap/db-tools.ts`, `src/resources/extensions/claude-code-cli/stream-adapter.ts`, `packages/mcp-server/src/server.ts` - -## Context - -> Current boundary: SF does not expose its workflow as an MCP server for other -> clients. SF runs directly through `sf`/`/sf autonomous`; MCP is a client-side -> integration surface for external tools that SF may call. This ADR is preserved -> as historical context for a provider-parity idea that was not adopted. - -SF currently has two different tool surfaces: - -1. **In-process extension tools** registered directly into the runtime via `pi.registerTool(...)`. -2. **An external MCP server** that exposes session orchestration and read-only project inspection. - -This split is now creating a real provider compatibility problem. - -### What exists today - -The core SF workflow tools are internal extension tools. Examples include: - -- `sf_summary_save` -- `sf_plan_milestone` -- `sf_plan_slice` -- `sf_plan_task` -- `sf_task_complete` -- `sf_slice_complete` -- `sf_complete_milestone` -- `sf_validate_milestone` -- `sf_replan_slice` -- `sf_reassess_roadmap` - -These are registered in `src/resources/extensions/sf/bootstrap/db-tools.ts` and related bootstrap files. SF prompts assume these tools are available during discuss, plan, and execute flows. - -Separately, `packages/mcp-server/src/server.ts` exposes a different tool surface: - -- session control: `sf_execute`, `sf_status`, `sf_result`, `sf_cancel`, `sf_query`, `sf_resolve_blocker` -- read-only inspection: `sf_progress`, `sf_roadmap`, `sf_history`, `sf_doctor`, `sf_captures`, `sf_knowledge` - -That MCP server is useful, but it is **not** a transport for the internal workflow/mutation tools. - -### The current failure mode - -The Claude Code CLI provider uses the Anthropic Agent SDK through `src/resources/extensions/claude-code-cli/stream-adapter.ts`. That adapter starts a Claude SDK session, but it does not forward the internal SF tool registry into the SDK session, nor does it attach a SF MCP server for those tools. - -As a result: - -- prompts tell the model to call tools like `sf_task_complete` -- the tools exist in SF -- but Claude Code sessions do not actually receive those tools - -This produces a contract mismatch: the model is required to use tools that are unavailable in that provider path. - -### Why this matters - -This is not a one-off Claude Code bug. It reveals a deeper architectural issue: - -- SF’s core workflow contract is transport-specific -- prompt authors assume “internal extension tool availability” -- provider integrations do not all share the same execution surface - -If SF wants provider parity, its workflow tools need a transport-neutral exposure model. ## Decision -**Expose the SF workflow tool contract over MCP as a first-class transport, and make MCP the compatibility layer for providers that cannot directly access the in-process SF tool registry.** +Never build or restore an SF MCP server package. SF is not an MCP server product and must not become one. -This means: +Current SF uses MCP only as a client-side integration surface for external tools that SF may call. SF workflow execution stays inside the `sf` runtime through in-process extension tools, daemon/RPC/headless paths, and DB-backed workflow state. -1. SF will keep its existing in-process tool registration for native runtime use. -2. SF will add an MCP execution surface for the same workflow tools. -3. Both surfaces must call the same underlying business logic. -4. Provider integrations such as Claude Code will use the MCP surface when they cannot access native in-process tools directly. +## Reason -The decision is explicitly **not** to replace the native tool system with MCP everywhere. MCP is the parity and portability layer, not the only runtime path. +The original ADR proposed exposing SF workflow tools through a new MCP server for provider parity. That creates another mutable workflow transport and another state-safety surface. The current architecture avoids that extra server boundary: planning, execution, validation, and completion write structured state to SQLite first and render markdown/JSON as projections. -## Decision Details +## Operational Rule -### 1. One handler layer, multiple transports +- Never recreate `packages/mcp-server`. +- Never add a `src/mcp-server.ts` workflow backend. +- Never document SF as an MCP server provider. +- Never add provider-parity work that depends on SF exposing itself over MCP. +- Keep `src/resources/extensions/mcp-client/` as the client integration boundary. -SF tool behavior must not be implemented twice. - -The transport-neutral business logic for workflow tools should be shared by: - -- native extension tool registration (`pi.registerTool(...)`) -- MCP server tool registration - -The MCP server should wrap the same handlers used by `db-tools.ts`, `query-tools.ts`, and related modules. This avoids logic drift and keeps validation, DB writes, file rendering, and recovery behavior consistent. - -### 2. Add a workflow-tool MCP surface - -SF will expose the workflow tools required for discuss, planning, execution, and completion over MCP. - -Initial minimum set: - -- `sf_summary_save` -- `sf_decision_save` -- `sf_plan_milestone` -- `sf_plan_slice` -- `sf_plan_task` -- `sf_task_complete` -- `sf_slice_complete` -- `sf_complete_milestone` -- `sf_validate_milestone` -- `sf_replan_slice` -- `sf_reassess_roadmap` -- `sf_save_gate_result` -- selected read/query tools such as `sf_milestone_status` - -Aliases should be treated conservatively. MCP should prefer canonical names unless compatibility requires exposing aliases. - -### 3. Preserve safety semantics - -The current SF safety model includes write gates, discussion gates, queue-mode restrictions, and state integrity guarantees. - -Those guarantees must continue to apply when tools are invoked over MCP. In particular: - -- MCP must not create a path that bypasses write gating -- MCP mutations must preserve the same DB/file/state invariants as native tools -- provider-specific fallback behavior must not allow manual summary writing in place of canonical completion tools - -### 4. Make provider capability checks explicit - -Before dispatching a workflow that requires SF workflow tools, SF should check whether the selected provider/session can access the required tool surface. - -If a provider cannot access either: - -- native in-process SF tools, or -- the SF MCP workflow tool surface - -then SF must fail early with a clear compatibility error rather than allowing execution to continue in a degraded, state-breaking mode. - -### 5. Keep the existing session/read MCP server - -The existing MCP server in `packages/mcp-server` remains valid. It serves a different purpose: - -- remote session orchestration -- status/result polling -- filesystem-backed project inspection - -The new workflow-tool MCP surface is complementary, not a replacement. - -## Alternatives Considered - -### Alternative A: Reroute away from Claude Code whenever tool-backed execution is needed - -This would fix the immediate failure for multi-provider users, but it does not solve provider parity. It also fails completely for users who only have Claude Code configured. - -**Rejected** because it treats the symptom, not the architectural gap. - -### Alternative B: Hard-fail Claude Code and require another provider - -This is a valid short-term guardrail and may still be used before MCP support is complete. - -**Rejected as the long-term architecture** because it permanently excludes a supported provider from first-class SF execution. - -### Alternative C: Inject the internal SF tool registry directly into the Claude Agent SDK without MCP - -This would tightly couple SF’s internal extension runtime to a provider-specific integration path. It would not generalize well to other providers or external tool clients. - -**Rejected** because it creates a provider-specific bridge instead of a transport-neutral contract. - -### Alternative D: Replace native SF tools entirely with MCP - -This would simplify the conceptual model, but it would force all runtimes through an external protocol boundary even when the native in-process path is faster and already works well. - -**Rejected** because MCP is needed for portability, not because the native tool system is flawed. - -## Consequences - -### Positive - -1. **Provider parity improves.** Providers that can consume MCP tools can participate in full SF workflow execution. -2. **The workflow contract becomes transport-neutral.** Prompts can rely on capabilities rather than a specific runtime implementation detail. -3. **One compatibility story for external clients.** Claude Code, Cursor, and other MCP-capable clients can use the same workflow tool surface. -4. **Better long-term architecture.** Internal tools and external transports converge on shared handlers instead of diverging implementations. - -### Negative - -1. **Larger surface area to secure and test.** Mutation tools over MCP are higher risk than read-only inspection tools. -2. **Migration complexity.** Tool registration, gating, and handler extraction must be refactored carefully. -3. **Two transport paths must remain aligned.** Native and MCP invocation semantics must stay behaviorally identical. - -### Neutral / Tradeoff - -The system will now support: - -- native in-process tool execution when available -- MCP-backed tool execution when native access is unavailable - -That is more complex than a single-path system, but it is the cost of provider portability without sacrificing native runtime quality. - -## Migration Plan - -### Phase 1: Extract shared handlers - -Refactor workflow tools so MCP and native registration can call the same transport-neutral functions. - -Priority targets: - -- `sf_summary_save` -- `sf_task_complete` -- `sf_plan_milestone` -- `sf_plan_slice` -- `sf_plan_task` - -### Phase 2: Stand up the workflow-tool MCP server - -Add a new MCP surface for workflow tool execution. This may extend the existing MCP package or live as a sibling package, but it must be clearly separated from the current session/read API. - -### Phase 3: Port safety enforcement - -Move or centralize write gates and related policy checks so MCP mutations cannot bypass the existing safety model. - -### Phase 4: Attach MCP workflow tools to Claude Code sessions - -Update the Claude Code provider integration to pass a SF-managed `mcpServers` configuration into the Claude Agent SDK session when required. - -### Phase 5: Add provider capability gating - -Before tool-dependent flows begin, verify that the active provider can access the required SF workflow tools via either native registration or MCP. - -### Phase 6: Update prompts and docs - -Prompt contracts should remain strict about using canonical SF completion/planning tools, but documentation and runtime messaging must no longer assume that only native in-process tool registration satisfies that contract. - -## Validation - -Success is defined by all of the following: - -1. A Claude Code-backed execution session can complete a task using canonical SF workflow tools without manual summary writing. -2. Native provider behavior remains unchanged. -3. MCP-invoked workflow tools produce the same DB updates, rendered artifacts, and state transitions as native tool calls. -4. Write-gate and discussion-gate protections still hold under MCP invocation. -5. When required capabilities are unavailable, SF fails early with a precise compatibility error. - -## Scope Notes - -This ADR establishes the architectural direction. It does **not** require full MCP exposure of every historical alias or every auxiliary tool in the first implementation. - -The first implementation should prioritize the minimum workflow tool set needed to make discuss/plan/execute/complete flows work safely for MCP-capable providers. +If a future integration needs external control of SF, use the daemon/RPC/headless contract. MCP remains for SF-as-client only. diff --git a/docs/dev/ADR-017-charm-tui-client.md b/docs/dev/ADR-017-charm-tui-client.md index 345fbc75a..d1210a7b8 100644 --- a/docs/dev/ADR-017-charm-tui-client.md +++ b/docs/dev/ADR-017-charm-tui-client.md @@ -9,7 +9,7 @@ sf today bundles its TUI directly in core: `pi-tui` (~10.5k LOC of TypeScript) i Three forces argue for extracting the TUI: -1. **sf is becoming truly headless-first** — `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` already exist. CLI invocations talk to the daemon. sf can be called as an MCP backend by Claude Code, Cursor, Hermes — they're TUI-agnostic. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*. +1. **sf is becoming truly headless-first** — `packages/daemon` and `packages/rpc-client` already exist. CLI invocations talk to the daemon. SF uses MCP as a client integration surface for external tools, not as an SF workflow server. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*. 2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would. 3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface. @@ -42,7 +42,7 @@ This ADR plans the extraction. - **sf core gets ~10k LOC leaner** after Stage 2. - **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`). -- **Headless / API-first architecture** is cleanly visible: daemon + RPC + MCP + clients. No TUI coupled to engine. +- **Headless / API-first architecture** is cleanly visible: daemon + RPC + clients, with MCP client integration for external tools. No TUI coupled to engine. - **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only. - **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this. @@ -85,7 +85,7 @@ Total: ~12–16 weeks across stages. ## References -- `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` — already exist; this ADR makes them load-bearing for clients. +- `packages/daemon`, `packages/rpc-client` — already exist; this ADR makes them load-bearing for clients. - `packages/pi-tui` — the existing TUI being deprecated. - `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate. - `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally. diff --git a/docs/dev/FILE-SYSTEM-MAP.md b/docs/dev/FILE-SYSTEM-MAP.md index ec62af85c..afe38c990 100644 --- a/docs/dev/FILE-SYSTEM-MAP.md +++ b/docs/dev/FILE-SYSTEM-MAP.md @@ -36,7 +36,7 @@ | **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap | | **LSP** | Language Server Protocol client and multiplexer | | **Mac Tools** | macOS-native utilities (Swift CLI) | -| **MCP Server/Client** | Model Context Protocol server and client | +| **MCP Client** | Model Context Protocol client integration for external tools | | **Memory Extension** | In-session memory pipeline and storage | | **Migration** | Data and config migration tools | | **Modes** | Interactive TUI, Print, RPC, and Web modes | @@ -87,7 +87,6 @@ | src/help-text.ts | CLI | Generates help text for all subcommands | | src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup | | src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader | -| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients | | src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF | | src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup | | src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF | @@ -388,9 +387,9 @@ |------|-----------------|-------------| | modes/index.ts | Modes | Mode system exports | | modes/print-mode.ts | Modes | Non-interactive print mode | -| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access | -| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction | -| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions | +| modes/rpc/rpc-mode.ts | Modes, RPC | RPC server mode for remote access | +| modes/rpc/rpc-client.ts | Modes, RPC | RPC client for remote agent interaction | +| modes/rpc/rpc-types.ts | Modes, RPC | RPC protocol type definitions | | modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC | | modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling | | modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities | @@ -610,7 +609,7 @@ | remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter | | remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter | | remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter | -| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration | +| mcp-client/index.ts | MCP Client | Model Context Protocol client integration | | subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension | | subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery | | subagent/isolation.ts | Subagent | Execution isolation and sandboxing | @@ -826,7 +825,7 @@ | File | System Label(s) | Description | |------|-----------------|-------------| | vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration | -| vscode-extension/src/sf-client.ts | VS Code Extension, MCP Server/Client | RPC client for SF agent communication | +| vscode-extension/src/sf-client.ts | VS Code Extension, RPC | RPC client for SF agent communication | | vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command | | vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display | @@ -975,7 +974,7 @@ Quick lookup: which files are part of each system? | **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* | | **LSP** | pi-coding-agent/src/core/lsp/* | | **Mac Tools** | src/resources/extensions/mac-tools/* | -| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/sf-client.ts, modes/rpc/* | +| **MCP Client** | src/resources/extensions/mcp-client/index.ts | | **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* | | **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh | | **Modes** | pi-coding-agent/src/modes/* | diff --git a/docs/plans/todo-triage-2026-05-06-plan.md b/docs/plans/todo-triage-2026-05-06-plan.md index 5c2455ea5..876c34bf2 100644 --- a/docs/plans/todo-triage-2026-05-06-plan.md +++ b/docs/plans/todo-triage-2026-05-06-plan.md @@ -34,10 +34,10 @@ These were not clearly represented as durable roadmap items and should be planne | Item | Why | Suggested tier | Implementation note | |---|---|---|---| | Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. | -| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Start with file-family thresholds or focused test suites for dispatch, recovery, UOK runtime, and validation. | -| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Tier 2 | Add a minimal lifecycle fixture that exercises DB rows as executable truth. | +| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Started with focused DB-authority/UOK runtime suites; continue with dispatch and recovery families before changing global thresholds. | +| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Done | Added runtime-state regression coverage proving SQLite slice/task order stays authoritative over stale markdown/JSON projections, and DB-backed runtime refuses implicit roadmap, plan, and summary imports. | | Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. | -| MCP package completeness audit | Docs mention MCP surfaces, but production completeness is unclear. | Tier 2 | Inspect `packages/mcp-server/`, record supported contracts, gaps, and deferred work. | +| MCP server residue/docs cleanup | SF currently ships the MCP client extension only; tracked MCP server source was removed. | Done | Removed untracked `packages/mcp-server/` residue and updated durable docs so future work never recreates an SF MCP server. | | Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. | | Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. | diff --git a/src/headless-ui.ts b/src/headless-ui.ts index e58a0ca8e..cb83bfc07 100644 --- a/src/headless-ui.ts +++ b/src/headless-ui.ts @@ -56,6 +56,36 @@ export interface HeadlessHeartbeatContext { lastEventDetail?: string; } +export interface AssistantPreviewDelta { + kind: "text" | "thinking"; + text: string; +} + +/** + * Extract a concise preview delta from an assistant message update. + * + * Purpose: keep headless non-verbose output honest by separating assistant text + * from model thinking before both are flushed at tool starts and message end. + * + * Consumer: headless.ts streaming event loop and headless progress tests. + */ +export function extractAssistantPreviewDelta( + assistantMessageEvent: unknown, +): AssistantPreviewDelta | null { + if (!assistantMessageEvent || typeof assistantMessageEvent !== "object") { + return null; + } + const event = assistantMessageEvent as Record; + const type = String(event.type ?? ""); + if (type !== "text_delta" && type !== "thinking_delta") return null; + const text = String(event.delta ?? event.text ?? ""); + if (!text) return null; + return { + kind: type === "thinking_delta" ? "thinking" : "text", + text, + }; +} + // --------------------------------------------------------------------------- // ANSI Color Helpers // --------------------------------------------------------------------------- diff --git a/src/headless.ts b/src/headless.ts index 39c7944ad..3b92012f7 100644 --- a/src/headless.ts +++ b/src/headless.ts @@ -60,6 +60,7 @@ import type { HeadlessJsonResult, OutputFormat } from "./headless-types.js"; import { VALID_OUTPUT_FORMATS } from "./headless-types.js"; import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js"; import { + extractAssistantPreviewDelta, formatHeadlessHeartbeat, formatProgress, formatPromptTraceLines, @@ -1440,9 +1441,15 @@ async function runHeadlessOnce( } } } - // Non-verbose: accumulate text_delta for truncated one-liner - else if (ame?.type === "text_delta") { - assistantTextBuffer += String(ame.delta ?? ame.text ?? ""); + // Non-verbose: accumulate separated thinking/text previews for + // truncated one-liners before tool calls and message end. + else { + const previewDelta = extractAssistantPreviewDelta(ame); + if (previewDelta?.kind === "text") { + assistantTextBuffer += previewDelta.text; + } else if (previewDelta?.kind === "thinking") { + thinkingBuffer += previewDelta.text; + } } } diff --git a/src/resources/extensions/sf/metrics.js b/src/resources/extensions/sf/metrics.js index addbb9f81..24003e6a1 100644 --- a/src/resources/extensions/sf/metrics.js +++ b/src/resources/extensions/sf/metrics.js @@ -41,7 +41,11 @@ function formatAggregateModelIdentity(modelId) { }); } /** - * Record a unit outcome to the llm_task_outcomes table for Bayesian learning. + * Record a unit outcome to the llm_task_outcomes table for Bayesian learning, + * and also to model-learner for per-task-type performance tracking. + * + * Integration point for quick win #2: model-learner activates continuous + * per-task-type model performance tracking for adaptive routing decisions. */ async function recordUnitOutcome(unit) { const db = getDatabase(); @@ -54,7 +58,7 @@ async function recordUnitOutcome(unit) { // drop bare-id entries rather than guessing the provider. if (slashIdx === -1) return; const provider = modelId.slice(0, slashIdx); - recordOutcome(db, { + const outcome = { modelId, provider, unitType: unit.type, @@ -68,7 +72,24 @@ async function recordUnitOutcome(unit) { tokens_total: unit.tokens.total, cost_usd: unit.cost, recorded_at: unit.startedAt, - }); + }; + + // Record to UOK llm_task_outcomes table + recordOutcome(db, outcome); + + // Quick Win #2: Also record to model-learner for per-task-type tracking + try { + const { ModelLearner } = await import("./model-learner.js"); + const learner = new ModelLearner(basePath); + learner.recordOutcome(unit.type, modelId, { + success: true, + timeout: false, + tokensUsed: unit.tokens.total, + costUsd: unit.cost, + }); + } catch { + /* model-learner integration is optional; never block outcome recording */ + } } catch { /* fire-and-forget */ } diff --git a/src/resources/extensions/sf/state.js b/src/resources/extensions/sf/state.js index 76efc7eb3..cce643e8e 100644 --- a/src/resources/extensions/sf/state.js +++ b/src/resources/extensions/sf/state.js @@ -37,17 +37,12 @@ import { getReplanHistory, getSlice, getSliceTasks, - insertMilestone, - insertSlice, - insertTask, isDbAvailable, - updateSliceStatus, - updateTaskStatus, wasDbOpenAttempted, } from "./sf-db.js"; import { isClosedStatus, isDeferredStatus } from "./status-guards.js"; import { extractVerdict } from "./verdict-parser.js"; -import { logError, logWarning } from "./workflow-logger.js"; +import { logWarning } from "./workflow-logger.js"; /** * A "ghost" milestone directory contains only META.json (and no substantive * files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY). These appear when @@ -219,8 +214,7 @@ export async function getActiveMilestoneId(basePath) { * STATE.md is a rendered cache of this output. * * When DB is available, queries milestone/slice/task tables directly. - * Falls back to filesystem parsing for unmigrated projects or when DB - * has zero milestones (e.g. first run before migration). + * Falls back to filesystem parsing only when DB is unavailable. */ export async function deriveState(basePath) { // Return cached result if within the TTL window for the same basePath @@ -235,22 +229,6 @@ export async function deriveState(basePath) { let result; // Dual-path: try DB-backed derivation first when hierarchy tables are populated if (isDbAvailable()) { - let dbMilestones = getAllMilestones(); - // Disk→DB reconciliation when DB is empty but disk has milestones (#2631). - // deriveStateFromDb() does its own reconciliation, but deriveState() skips - // it entirely when the DB is empty. Sync here so the DB path is used when - // disk milestones exist but haven't been migrated yet. - if (dbMilestones.length === 0) { - const diskIds = findMilestoneIds(basePath); - let synced = false; - for (const diskId of diskIds) { - if (!isGhostMilestone(basePath, diskId)) { - insertMilestone({ id: diskId, status: "active" }); - synced = true; - } - } - if (synced) dbMilestones = getAllMilestones(); - } const stopDbTimer = debugTime("derive-state-db"); result = await deriveStateFromDb(basePath); stopDbTimer({ @@ -300,91 +278,29 @@ function extractContextTitle(content, fallback) { const isStatusDone = isClosedStatus; /** * Derive SF state from the milestones/slices/tasks DB tables. - * Flag files (PARKED, VALIDATION, CONTINUE, REPLAN, REPLAN-TRIGGER, CONTEXT-DRAFT) - * are still checked on the filesystem since they aren't in DB tables. + * Non-planning control files (PARKED, CONTINUE, REPLAN, REPLAN-TRIGGER, + * CONTEXT-DRAFT) are still checked on the filesystem since they are not + * hierarchy state. * Requirements also stay file-based via parseRequirementCounts(). * - * Must produce field-identical SFState to _deriveStateImpl() for the same project. + * Must not import rendered roadmap, plan, or summary artifacts into DB-backed + * runtime state. Explicit migration/repair flows own any legacy file import. */ function reconcileDiskToDb(basePath) { - let allMilestones = getAllMilestones(); - const dbIdSet = new Set(allMilestones.map((m) => m.id)); const diskIds = findMilestoneIds(basePath); - let synced = false; - for (const diskId of diskIds) { - if (!dbIdSet.has(diskId) && !isGhostMilestone(basePath, diskId)) { - insertMilestone({ id: diskId, status: "active" }); - synced = true; - } - } - if (synced) allMilestones = getAllMilestones(); - for (const mid of diskIds) { - if (isGhostMilestone(basePath, mid)) continue; - const roadmapPath = resolveMilestoneFile(basePath, mid, "ROADMAP"); - if (!roadmapPath) continue; - const dbSlices = getMilestoneSlices(mid); - const dbSliceIds = new Set(dbSlices.map((s) => s.id)); - let roadmapContent; - try { - roadmapContent = readFileSync(roadmapPath, "utf-8"); - } catch (err) { + if (diskIds.length > 0) { + const dbIds = new Set(getAllMilestones().map((m) => m.id)); + const diskOnlyIds = diskIds.filter( + (id) => !dbIds.has(id) && !isGhostMilestone(basePath, id), + ); + if (diskOnlyIds.length > 0) { logWarning( "state", - "reconcileDiskToDb: roadmap read failed, skipping milestone", - { - mid, - error: err.message, - }, + `DB-backed state ignored ${diskOnlyIds.length} disk-only milestone(s): ${diskOnlyIds.join(", ")}`, ); - continue; - } - const parsed = parseRoadmap(roadmapContent); - for (const s of parsed.slices) { - if (dbSliceIds.has(s.id)) continue; - const summaryPath = resolveSliceFile(basePath, mid, s.id, "SUMMARY"); - const sliceStatus = s.done || summaryPath ? "complete" : "pending"; - insertSlice({ - id: s.id, - milestoneId: mid, - title: s.title, - status: sliceStatus, - risk: s.risk, - depends: s.depends, - demo: s.demo, - }); - } - // Reconcile stale *existing* slice rows (#3599): a slice row may exist in - // the DB with status "pending" even though disk artifacts (SUMMARY) prove - // completion — the same class of desync that task-level reconciliation - // (further below) already handles. Without this, the dependency resolver - // builds doneSliceIds from stale DB rows and downstream slices stay blocked - // forever with "No slice eligible". - for (const dbSlice of dbSlices) { - if (isStatusDone(dbSlice.status)) continue; - const summaryPath = resolveSliceFile( - basePath, - mid, - dbSlice.id, - "SUMMARY", - ); - if (summaryPath) { - try { - updateSliceStatus(mid, dbSlice.id, "complete"); - logWarning( - "reconcile", - `slice ${mid}/${dbSlice.id} status reconciled from "${dbSlice.status}" to "complete" (#3599)`, - { mid, sid: dbSlice.id }, - ); - } catch (e) { - logError("reconcile", `failed to update slice ${dbSlice.id}`, { - sid: dbSlice.id, - error: e.message, - }); - } - } } } - return allMilestones; + return getAllMilestones(); } function buildCompletenessSet(basePath, milestones) { const completeMilestoneIds = new Set(); @@ -727,47 +643,14 @@ function resolveSliceDependencies(activeMilestoneSlices) { return { activeSlice: null, activeSliceRow: null }; } async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) { - let tasks = getSliceTasks(milestoneId, sliceId); + const tasks = getSliceTasks(milestoneId, sliceId); if (tasks.length === 0 && planFile) { - try { - const planContent = await loadFile(planFile); - if (planContent) { - const diskPlan = parsePlan(planContent); - if (diskPlan.tasks.length > 0) { - for (let i = 0; i < diskPlan.tasks.length; i++) { - const t = diskPlan.tasks[i]; - try { - insertTask({ - id: t.id, - sliceId, - milestoneId, - title: t.title, - status: t.done ? "complete" : "pending", - sequence: i + 1, - }); - } catch (insertErr) { - logWarning( - "reconcile", - `failed to insert task ${t.id} from plan file: ${insertErr instanceof Error ? insertErr.message : String(insertErr)}`, - ); - } - } - tasks = getSliceTasks(milestoneId, sliceId); - logWarning( - "reconcile", - `imported ${tasks.length} tasks from plan file for ${milestoneId}/${sliceId} — DB was empty (#3600)`, - { mid: milestoneId, sid: sliceId }, - ); - } - } - } catch (err) { - logError( - "reconcile", - `plan-file task import failed for ${milestoneId}/${sliceId}: ${err instanceof Error ? err.message : String(err)}`, - ); - } + logWarning( + "reconcile", + `slice plan file exists for ${milestoneId}/${sliceId}, but DB has no task rows; refusing runtime import`, + { mid: milestoneId, sid: sliceId }, + ); } - let reconciled = false; for (const t of tasks) { if (isStatusDone(t.status)) continue; const summaryPath = resolveTaskFile( @@ -778,41 +661,13 @@ async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) { "SUMMARY", ); if (summaryPath && existsSync(summaryPath)) { - // Validate that the summary file has actual content (#sf-moobj36o-6rxy6e) - const summaryContent = readFileSync(summaryPath, "utf-8"); - if (!isValidTaskSummary(summaryContent)) { - logWarning( - "reconcile", - `task ${milestoneId}/${sliceId}/${t.id} has empty/invalid SUMMARY — skipping reconciliation`, - { mid: milestoneId, sid: sliceId, tid: t.id }, - ); - continue; - } - try { - updateTaskStatus( - milestoneId, - sliceId, - t.id, - "complete", - new Date().toISOString(), - ); - logWarning( - "reconcile", - `task ${milestoneId}/${sliceId}/${t.id} status reconciled from "${t.status}" to "complete" (#2514)`, - { mid: milestoneId, sid: sliceId, tid: t.id }, - ); - reconciled = true; - } catch (e) { - logError("reconcile", `failed to update task ${t.id}`, { - tid: t.id, - error: e.message, - }); - } + logWarning( + "reconcile", + `task ${milestoneId}/${sliceId}/${t.id} has SUMMARY on disk but DB status is "${t.status}"; refusing runtime status import`, + { mid: milestoneId, sid: sliceId, tid: t.id }, + ); } } - if (reconciled) { - tasks = getSliceTasks(milestoneId, sliceId); - } return tasks; } async function detectBlockers(basePath, milestoneId, sliceId, tasks) { diff --git a/src/resources/extensions/sf/tests/db-driven-runtime-state.test.mjs b/src/resources/extensions/sf/tests/db-driven-runtime-state.test.mjs new file mode 100644 index 000000000..28f89e612 --- /dev/null +++ b/src/resources/extensions/sf/tests/db-driven-runtime-state.test.mjs @@ -0,0 +1,241 @@ +/** + * db-driven-runtime-state.test.mjs — DB authority for runtime state. + * + * Purpose: prove DB-backed projects choose active slices/tasks from structured + * SQLite rows even when generated markdown/JSON projections are stale. + */ +import assert from "node:assert/strict"; +import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { afterEach, test } from "vitest"; +import { + closeDatabase, + getAllMilestones, + getSliceTasks, + insertMilestone, + insertSlice, + insertTask, + openDatabase, +} from "../sf-db.js"; +import { deriveState, invalidateStateCache } from "../state.js"; + +const tmpDirs = []; + +afterEach(() => { + closeDatabase(); + invalidateStateCache(); + while (tmpDirs.length > 0) { + const dir = tmpDirs.pop(); + if (dir) rmSync(dir, { recursive: true, force: true }); + } +}); + +function makeProject() { + const dir = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-")); + tmpDirs.push(dir); + mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S01"), { + recursive: true, + }); + mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S02"), { + recursive: true, + }); + openDatabase(join(dir, ".sf", "sf.db")); + insertMilestone({ + id: "M777", + title: "DB runtime authority", + status: "active", + }); + insertSlice({ + milestoneId: "M777", + id: "S02", + title: "DB first slice", + status: "pending", + sequence: 1, + }); + insertSlice({ + milestoneId: "M777", + id: "S01", + title: "Stale file first slice", + status: "pending", + sequence: 2, + }); + insertTask({ + milestoneId: "M777", + sliceId: "S02", + id: "T02", + title: "DB first task", + status: "pending", + sequence: 1, + }); + insertTask({ + milestoneId: "M777", + sliceId: "S02", + id: "T01", + title: "DB second task", + status: "pending", + sequence: 2, + }); + return dir; +} + +test("deriveState_when_generated_projections_are_stale_uses_db_slice_and_task_sequence", async () => { + const project = makeProject(); + const milestoneDir = join(project, ".sf", "milestones", "M777"); + writeFileSync( + join(milestoneDir, "M777-ROADMAP.md"), + [ + "# M777: stale generated roadmap", + "", + "## Slice Overview", + "| ID | Slice | Risk | Depends | Done | After this |", + "|----|-------|------|---------|------|------------|", + "| S01 | Stale file first slice | low | - | | stale first |", + "| S02 | DB first slice | low | - | | should still be first by DB |", + "", + ].join("\n"), + ); + writeFileSync( + join(milestoneDir, "M777-ROADMAP.json"), + JSON.stringify( + { + origin: "stale-test-projection", + slices: [ + { id: "S01", title: "Stale file first slice", sequence: 1 }, + { id: "S02", title: "DB first slice", sequence: 2 }, + ], + }, + null, + 2, + ), + ); + writeFileSync( + join(milestoneDir, "slices", "S02", "S02-PLAN.md"), + [ + "# S02: stale generated plan", + "", + "## Tasks", + "", + "- [ ] T99: stale task that should not replace DB rows", + "", + ].join("\n"), + ); + + const state = await deriveState(project); + + assert.equal(state.phase, "executing"); + assert.equal(state.activeMilestone?.id, "M777"); + assert.equal(state.activeSlice?.id, "S02"); + assert.equal(state.activeTask?.id, "T02"); +}); + +test("deriveState_when_db_has_no_tasks_refuses_runtime_plan_file_import", async () => { + const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-")); + tmpDirs.push(project); + mkdirSync(join(project, ".sf", "milestones", "M778", "slices", "S01"), { + recursive: true, + }); + openDatabase(join(project, ".sf", "sf.db")); + insertMilestone({ + id: "M778", + title: "DB planning authority", + status: "active", + }); + insertSlice({ + milestoneId: "M778", + id: "S01", + title: "Plan exists without DB tasks", + status: "pending", + sequence: 1, + }); + writeFileSync( + join(project, ".sf", "milestones", "M778", "M778-ROADMAP.md"), + [ + "# M778: DB planning authority", + "", + "## Slice Overview", + "| ID | Slice | Risk | Depends | Done | After this |", + "|----|-------|------|---------|------|------------|", + "| S01 | Plan exists without DB tasks | low | - | | keep planning |", + "", + ].join("\n"), + ); + writeFileSync( + join(project, ".sf", "milestones", "M778", "slices", "S01", "S01-PLAN.md"), + [ + "# S01: stale generated plan", + "", + "## Tasks", + "", + "- [ ] T01: stale file task", + "", + ].join("\n"), + ); + + const state = await deriveState(project); + + assert.equal(state.phase, "planning"); + assert.equal(state.activeMilestone?.id, "M778"); + assert.equal(state.activeSlice?.id, "S01"); + assert.equal(state.activeTask, null); + assert.equal(getSliceTasks("M778", "S01").length, 0); + assert.match(state.nextAction, /has a plan file but no tasks/i); +}); + +test("deriveState_when_db_has_no_milestones_refuses_runtime_roadmap_import", async () => { + const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-")); + tmpDirs.push(project); + const milestoneDir = join(project, ".sf", "milestones", "M779"); + mkdirSync(milestoneDir, { recursive: true }); + openDatabase(join(project, ".sf", "sf.db")); + writeFileSync( + join(milestoneDir, "M779-ROADMAP.md"), + [ + "# M779: stale disk-only roadmap", + "", + "## Slice Overview", + "| ID | Slice | Risk | Depends | Done | After this |", + "|----|-------|------|---------|------|------------|", + "| S01 | Should not import | low | - | | no DB mutation |", + "", + ].join("\n"), + ); + + const state = await deriveState(project); + + assert.equal(state.phase, "pre-planning"); + assert.equal(state.activeMilestone, null); + assert.deepEqual(getAllMilestones(), []); +}); + +test("deriveState_when_task_summary_exists_keeps_db_task_status_authoritative", async () => { + const project = makeProject(); + const taskDir = join( + project, + ".sf", + "milestones", + "M777", + "slices", + "S02", + "tasks", + ); + mkdirSync(taskDir, { recursive: true }); + writeFileSync( + join(taskDir, "T02-SUMMARY.md"), + [ + "# T02: stale summary projection", + "", + "This file is not DB state.", + "", + ].join("\n"), + ); + + const state = await deriveState(project); + const [firstTask] = getSliceTasks("M777", "S02"); + + assert.equal(state.phase, "executing"); + assert.equal(state.activeSlice?.id, "S02"); + assert.equal(state.activeTask?.id, "T02"); + assert.equal(firstTask.id, "T02"); + assert.equal(firstTask.status, "pending"); +}); diff --git a/src/resources/extensions/sf/triage-self-feedback.js b/src/resources/extensions/sf/triage-self-feedback.js index bf7bcc81b..2b7d58385 100644 --- a/src/resources/extensions/sf/triage-self-feedback.js +++ b/src/resources/extensions/sf/triage-self-feedback.js @@ -184,13 +184,16 @@ export function parseTriageReport(agentOutput) { * Idempotent: skips rows whose ID already appears in the file. * 2. Call markResolved for each resolution in the report. * Idempotent: markResolved itself skips already-resolved entries. + * 3. Quick Win #1: Auto-fix high-confidence self-reports where confidence > 0.85. * * @param basePath - project root (directory containing .sf/) * @param report - parsed TriageReport from parseTriageReport() */ -export function applyTriageReport(basePath, report) { +export async function applyTriageReport(basePath, report) { let requirementsAdded = 0; let entriesResolved = 0; + let reportsAutoFixed = 0; + // ── 1. Write promoted requirements ──────────────────────────────────────── if (report.promotedRequirements.length > 0) { const sfDir = sfRoot(basePath); @@ -244,6 +247,7 @@ export function applyTriageReport(basePath, report) { } writeFileSync(reqPath, content, "utf-8"); } + // ── 2. Resolve entries ──────────────────────────────────────────────────── for (const resolution of report.resolutions) { const evidenceKind = resolution.evidenceKind; @@ -258,5 +262,26 @@ export function applyTriageReport(basePath, report) { ); if (mutated) entriesResolved++; } - return { requirementsAdded, entriesResolved }; + + // ── 3. Quick Win #1: Auto-fix high-confidence self-reports ──────────────── + // Integration point for self-report-fixer: read open reports and auto-apply + // fixes where confidence > 0.85. + try { + const { autoFixHighConfidenceReports } = await import( + "./self-report-fixer.js" + ); + const allOpen = [ + ...readAllSelfFeedback(basePath), + ...readUpstreamSelfFeedback(), + ].filter((e) => !e.resolvedAt); + + if (allOpen.length > 0) { + const result = await autoFixHighConfidenceReports(basePath, allOpen); + reportsAutoFixed = result.applied.length; + } + } catch { + /* self-report fixer is optional; never block triage report application */ + } + + return { requirementsAdded, entriesResolved, reportsAutoFixed }; } diff --git a/src/tests/headless-progress.test.ts b/src/tests/headless-progress.test.ts index a753be36c..c399b5cb0 100644 --- a/src/tests/headless-progress.test.ts +++ b/src/tests/headless-progress.test.ts @@ -2,6 +2,7 @@ import assert from "node:assert/strict"; import { describe, it } from "vitest"; import type { ProgressContext } from "../headless-ui.js"; import { + extractAssistantPreviewDelta, formatCostLine, formatHeadlessHeartbeat, formatProgress, @@ -556,6 +557,31 @@ describe("formatThinkingLine", () => { }); }); +describe("extractAssistantPreviewDelta", () => { + it("separates assistant text deltas from thinking deltas", () => { + assert.deepEqual( + extractAssistantPreviewDelta({ + type: "text_delta", + delta: "I will edit the file.", + }), + { kind: "text", text: "I will edit the file." }, + ); + assert.deepEqual( + extractAssistantPreviewDelta({ + type: "thinking_delta", + delta: "Need inspect first.", + }), + { kind: "thinking", text: "Need inspect first." }, + ); + }); + + it("ignores non-preview assistant events", () => { + assert.equal(extractAssistantPreviewDelta({ type: "text_start" }), null); + assert.equal(extractAssistantPreviewDelta({ type: "thinking_end" }), null); + assert.equal(extractAssistantPreviewDelta(null), null); + }); +}); + describe("formatCostLine", () => { it("formats cost with token count", () => { const result = formatCostLine(0.0523, 4200, 1100);