integrate: hook quick wins into UOK dispatch loop
Integration of 3 quick wins into existing UOK infrastructure: 1. Model Learning (Quick Win #2) → metrics.js - Record outcomes to model-learner for per-task-type performance tracking - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome() - Fire-and-forget: never blocks outcome recording on learning failure - Enables adaptive model routing decisions in downstream gates 2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js - Auto-fix high-confidence reports (>0.85) in applyTriageReport() - Hook: After triage and requirement promotion, apply auto-fixes - Fire-and-forget: never blocks report application on fix failure - Returns reportsAutoFixed count for triage metrics 3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js - Already active in execute-task prompt template - Semantic matching with graceful degradation All integration points: - Fire-and-forget: learning/fixing failures never block dispatch - UOK-native: use existing outcome recording, db, gates - Backward compatible: applyTriageReport now async, but callers handle it - No new dependencies: all modules already in codebase Testing: 2934 tests pass (no regressions from integration) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
parent
62a04f1073
commit
553ba23b89
13 changed files with 420 additions and 759 deletions
|
|
@ -57,7 +57,7 @@ Process for each:
|
||||||
| 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
|
| 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
|
||||||
| 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/` → `.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) |
|
| 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/` → `.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) |
|
||||||
| 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
|
| 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
|
||||||
| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` |
|
| 6 | **MCP server stdout-buffer deadlock** | Not applicable — SF no longer ships an MCP server package. Do not port unless a future accepted ADR reintroduces an SF-owned MCP server. | N/A |
|
||||||
| 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
|
| 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
|
||||||
| 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
|
| 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
|
||||||
| 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
|
| 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
|
||||||
|
|
@ -235,7 +235,7 @@ Spec sections that landed during late-stage adversarial review and only matter a
|
||||||
| Item | Spec | Why deferred |
|
| Item | Spec | Why deferred |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. |
|
| SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. |
|
||||||
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. The MCP server (`packages/mcp-server`) is the more likely remote interface. |
|
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. SF currently supports MCP as a client surface only, not as an SF workflow server. |
|
||||||
| `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. |
|
| `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. |
|
||||||
| PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. |
|
| PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. |
|
||||||
| Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. |
|
| Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. |
|
||||||
|
|
|
||||||
|
|
@ -1,339 +1,17 @@
|
||||||
# ADR-008 Implementation Plan
|
# ADR-008 Implementation Plan
|
||||||
|
|
||||||
**Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md)
|
**Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md)
|
||||||
**Status:** Superseded — do not implement
|
**Status:** Rejected — never implement
|
||||||
**Date:** 2026-04-09
|
**Date:** 2026-04-09
|
||||||
|
|
||||||
> Superseded by the current boundary in ADR-020: SF workflow tools are not exposed
|
## Superseded Boundary
|
||||||
> as an MCP server for external clients. SF may use MCP clients for external tools,
|
|
||||||
> but SF itself runs directly.
|
|
||||||
|
|
||||||
## Objective
|
Never build or restore an SF MCP server package.
|
||||||
|
|
||||||
Implement the ADR-008 decision by exposing the core SF workflow tool contract over MCP, then wiring MCP-backed access into provider paths that cannot use the native in-process SF tool registry directly.
|
Current SF product boundary:
|
||||||
|
|
||||||
The first usable outcome is:
|
- SF uses MCP only as a **client** for external tool servers configured through `.mcp.json` or `.sf/mcp.json`.
|
||||||
|
- SF workflow execution runs through native SF tools, headless/RPC, and the daemon/RPC client path.
|
||||||
|
- SF never exposes its workflow as an MCP server for Claude Code, Cursor, or other external clients.
|
||||||
|
|
||||||
- a Claude Code-backed execution session can complete a task using canonical SF tools
|
This document is kept only as a tombstone for the rejected implementation plan. Do not revive this direction.
|
||||||
- no manual summary-writing fallback is needed
|
|
||||||
- native provider behavior remains unchanged
|
|
||||||
|
|
||||||
## Non-Goals
|
|
||||||
|
|
||||||
- Replacing native in-process SF tools with MCP
|
|
||||||
- Exporting every historical alias in the first rollout
|
|
||||||
- Reworking the entire session-oriented MCP server before proving the workflow-tool surface
|
|
||||||
- Supporting every provider path before Claude Code is working end-to-end
|
|
||||||
|
|
||||||
## Constraints
|
|
||||||
|
|
||||||
- Native and MCP tool paths must share business logic
|
|
||||||
- MCP must not bypass write-gate or discussion-gate protections
|
|
||||||
- Canonical SF state transitions must remain DB-backed
|
|
||||||
- Provider capability mismatches must fail early, not degrade silently
|
|
||||||
|
|
||||||
## Workstreams
|
|
||||||
|
|
||||||
### 1. Shared Handler Extraction
|
|
||||||
|
|
||||||
Goal: separate business logic from transport registration.
|
|
||||||
|
|
||||||
Targets:
|
|
||||||
|
|
||||||
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
|
|
||||||
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
|
|
||||||
- `src/resources/extensions/sf/tools/complete-task.ts`
|
|
||||||
- sibling modules used by planning/summary/validation tools
|
|
||||||
|
|
||||||
Deliverables:
|
|
||||||
|
|
||||||
- transport-neutral handler entrypoints for the minimum workflow tool set
|
|
||||||
- thin native registration wrappers that call those handlers
|
|
||||||
- thin MCP registration wrappers that call those handlers
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- native tool behavior is unchanged
|
|
||||||
- no workflow tool logic is duplicated in MCP server code
|
|
||||||
|
|
||||||
### 2. Workflow-Tool MCP Surface
|
|
||||||
|
|
||||||
Goal: add an MCP server surface for real SF workflow tools, distinct from the current session/read API.
|
|
||||||
|
|
||||||
Preferred first-cut tool set:
|
|
||||||
|
|
||||||
- `sf_summary_save`
|
|
||||||
- `sf_decision_save`
|
|
||||||
- `sf_plan_milestone`
|
|
||||||
- `sf_plan_slice`
|
|
||||||
- `sf_plan_task`
|
|
||||||
- `sf_task_complete`
|
|
||||||
- `sf_slice_complete`
|
|
||||||
- `sf_complete_milestone`
|
|
||||||
- `sf_validate_milestone`
|
|
||||||
- `sf_replan_slice`
|
|
||||||
- `sf_reassess_roadmap`
|
|
||||||
- `sf_save_gate_result`
|
|
||||||
- `sf_milestone_status`
|
|
||||||
|
|
||||||
Likely files:
|
|
||||||
|
|
||||||
- `packages/mcp-server/src/server.ts` or a new sibling server package
|
|
||||||
- `packages/mcp-server/src/...` supporting modules
|
|
||||||
- shared tool-definition metadata if needed
|
|
||||||
|
|
||||||
Decisions to make during implementation:
|
|
||||||
|
|
||||||
- extend existing MCP package vs create `packages/mcp-sf-tools-server`
|
|
||||||
- canonical names only vs selected alias export
|
|
||||||
- single combined server vs separate “session” and “workflow” server modes
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- MCP tool discovery shows the minimum tool set
|
|
||||||
- each MCP tool invokes the shared handlers successfully in isolation
|
|
||||||
|
|
||||||
### 3. Safety and Policy Parity
|
|
||||||
|
|
||||||
Goal: ensure MCP mutations enforce the same rules as native tool calls.
|
|
||||||
|
|
||||||
Targets:
|
|
||||||
|
|
||||||
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
|
|
||||||
- any current tool-call gating hooks tied to native runtime only
|
|
||||||
- MCP wrapper layer before shared handler invocation
|
|
||||||
|
|
||||||
Required protections:
|
|
||||||
|
|
||||||
- discussion gate blocking
|
|
||||||
- queue-mode restrictions
|
|
||||||
- write-path restrictions
|
|
||||||
- canonical DB/file rendering order
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- MCP cannot be used to bypass native write restrictions
|
|
||||||
- blocked native scenarios remain blocked over MCP
|
|
||||||
|
|
||||||
### 4. Claude Code Provider Integration
|
|
||||||
|
|
||||||
Goal: attach the SF workflow-tool MCP surface to Claude Code sessions.
|
|
||||||
|
|
||||||
Targets:
|
|
||||||
|
|
||||||
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
|
|
||||||
- `src/resources/extensions/claude-code-cli/index.ts`
|
|
||||||
|
|
||||||
Expected work:
|
|
||||||
|
|
||||||
- build a SF-managed `mcpServers` config for the Claude SDK session
|
|
||||||
- attach the workflow MCP server only when the session requires SF tools
|
|
||||||
- keep current Claude Code streaming behavior intact
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- Claude Code session can discover the SF workflow MCP tools
|
|
||||||
- task execution path can call `sf_task_complete` successfully
|
|
||||||
|
|
||||||
### 5. Capability Detection and Failure Path
|
|
||||||
|
|
||||||
Goal: refuse to start tool-dependent workflows when required capabilities are unavailable.
|
|
||||||
|
|
||||||
Targets:
|
|
||||||
|
|
||||||
- SF dispatch / auto-mode preflight
|
|
||||||
- provider selection and routing checks
|
|
||||||
- user-facing compatibility errors
|
|
||||||
|
|
||||||
Required behavior:
|
|
||||||
|
|
||||||
- if native SF tools are available, proceed
|
|
||||||
- else if SF workflow MCP tools are available, proceed
|
|
||||||
- else fail fast with a precise message
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- no execution prompt is sent that requires unavailable tools
|
|
||||||
- users with only unsupported capability combinations get a hard error, not a fake fallback
|
|
||||||
|
|
||||||
### 6. Prompt and Documentation Alignment
|
|
||||||
|
|
||||||
Goal: keep the workflow contract strict while removing transport assumptions from docs and runtime messaging.
|
|
||||||
|
|
||||||
Targets:
|
|
||||||
|
|
||||||
- `src/resources/extensions/sf/prompts/execute-task.md`
|
|
||||||
- related planning/discuss prompts that reference tool availability
|
|
||||||
- provider and MCP docs
|
|
||||||
|
|
||||||
Rules:
|
|
||||||
|
|
||||||
- prompts should keep requiring canonical SF completion/planning tools
|
|
||||||
- prompts should not imply “native in-process tool only”
|
|
||||||
- docs should explain native vs MCP-backed fulfillment paths
|
|
||||||
|
|
||||||
Exit criteria:
|
|
||||||
|
|
||||||
- prompt contract matches runtime reality
|
|
||||||
- no provider is told to use a tool surface it cannot access
|
|
||||||
|
|
||||||
## Phase Plan
|
|
||||||
|
|
||||||
## Phase 1: Spike and Handler Extraction
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- extract shared logic for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`
|
|
||||||
- prove native wrappers still work
|
|
||||||
|
|
||||||
Why first:
|
|
||||||
|
|
||||||
- these tools are enough to test end-to-end completion semantics without migrating the full catalog
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- existing native tests still pass
|
|
||||||
- new unit tests cover shared handler entrypoints directly
|
|
||||||
|
|
||||||
## Phase 2: Minimal Workflow MCP Server
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- expose the three extracted tools over MCP
|
|
||||||
- ensure discovery schemas are clean and canonical
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- MCP discovery returns all three tools
|
|
||||||
- direct MCP calls succeed against a fixture project
|
|
||||||
|
|
||||||
## Phase 3: Claude Code End-to-End Proof
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- wire the minimal workflow MCP server into the Claude SDK session
|
|
||||||
- run a single execution path that ends with task completion
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- Claude Code can call `sf_task_complete`
|
|
||||||
- summary file, DB state, and plan checkbox update correctly
|
|
||||||
|
|
||||||
## Phase 4: Expand to Full Minimum Workflow Set
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- add planning, slice completion, milestone completion, roadmap reassessment, and gate result tools
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- discuss/plan/execute/complete lifecycle works over MCP for the supported flow set
|
|
||||||
|
|
||||||
## Phase 5: Capability Gating and UX Hardening
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- add preflight capability checks
|
|
||||||
- add clear error messaging for unsupported setups
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- unsupported provider/session combinations fail before execution starts
|
|
||||||
|
|
||||||
## Phase 6: Prompt and Doc Cleanup
|
|
||||||
|
|
||||||
Scope:
|
|
||||||
|
|
||||||
- align prompts and docs with the new transport-neutral contract
|
|
||||||
|
|
||||||
Verification:
|
|
||||||
|
|
||||||
- prompt references are accurate
|
|
||||||
- docs describe the supported architecture and limitations
|
|
||||||
|
|
||||||
## File-Level Starting Map
|
|
||||||
|
|
||||||
High-probability files for the first implementation:
|
|
||||||
|
|
||||||
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
|
|
||||||
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
|
|
||||||
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
|
|
||||||
- `src/resources/extensions/sf/tools/complete-task.ts`
|
|
||||||
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
|
|
||||||
- `src/resources/extensions/claude-code-cli/index.ts`
|
|
||||||
- `packages/mcp-server/src/server.ts`
|
|
||||||
- `packages/mcp-server/src/session-manager.ts`
|
|
||||||
- `packages/mcp-server/README.md`
|
|
||||||
- `src/resources/extensions/sf/prompts/execute-task.md`
|
|
||||||
|
|
||||||
## Testing Strategy
|
|
||||||
|
|
||||||
### Unit
|
|
||||||
|
|
||||||
- shared handlers
|
|
||||||
- MCP wrapper adapters
|
|
||||||
- gating / capability-check helpers
|
|
||||||
|
|
||||||
### Integration
|
|
||||||
|
|
||||||
- direct MCP tool invocation against fixture projects
|
|
||||||
- native tool invocation regression coverage
|
|
||||||
- Claude Code provider path with MCP attached
|
|
||||||
|
|
||||||
### End-to-End
|
|
||||||
|
|
||||||
- plan or execute a small fixture task and complete it through canonical SF tools
|
|
||||||
- confirm DB row, rendered summary, and plan state stay in sync
|
|
||||||
|
|
||||||
## Risks
|
|
||||||
|
|
||||||
### Risk 1: Logic Drift
|
|
||||||
|
|
||||||
If native and MCP wrappers each evolve their own behavior, parity will collapse quickly.
|
|
||||||
|
|
||||||
Mitigation:
|
|
||||||
|
|
||||||
- shared handler extraction before broad MCP exposure
|
|
||||||
|
|
||||||
### Risk 2: Safety Regression
|
|
||||||
|
|
||||||
If MCP becomes a side door around native gating, the architecture is worse than before.
|
|
||||||
|
|
||||||
Mitigation:
|
|
||||||
|
|
||||||
- centralize or reuse gating checks before shared handler invocation
|
|
||||||
|
|
||||||
### Risk 3: Overly Broad First Rollout
|
|
||||||
|
|
||||||
Exporting every tool and alias immediately increases scope and test burden.
|
|
||||||
|
|
||||||
Mitigation:
|
|
||||||
|
|
||||||
- ship a minimal workflow tool set first
|
|
||||||
|
|
||||||
### Risk 4: Claude SDK Session Wiring Complexity
|
|
||||||
|
|
||||||
Attaching MCP servers dynamically may expose edge cases around cwd, permissions, or subprocess lifecycle.
|
|
||||||
|
|
||||||
Mitigation:
|
|
||||||
|
|
||||||
- prove a narrow spike with 2-3 tools before expanding
|
|
||||||
|
|
||||||
## Exit Criteria for ADR-008
|
|
||||||
|
|
||||||
ADR-008 is considered implemented when:
|
|
||||||
|
|
||||||
1. Claude Code-backed execution can use canonical SF workflow tools over MCP.
|
|
||||||
2. Native provider behavior remains intact.
|
|
||||||
3. Shared handlers back both native and MCP invocation.
|
|
||||||
4. Gating and state integrity protections apply equally to MCP mutations.
|
|
||||||
5. Capability checks prevent prompts from requiring unavailable tools.
|
|
||||||
|
|
||||||
## Recommended Next Task
|
|
||||||
|
|
||||||
Start with a narrow spike:
|
|
||||||
|
|
||||||
1. Extract shared handlers for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`.
|
|
||||||
2. Expose those tools through a minimal workflow MCP server.
|
|
||||||
3. Attach that MCP server to Claude Code sessions.
|
|
||||||
4. Prove end-to-end task completion on a fixture project.
|
|
||||||
|
|
|
||||||
|
|
@ -1,246 +1,25 @@
|
||||||
# ADR-008: Expose SF Workflow Tools Over MCP for Provider Parity
|
# ADR-008: SF Tools Over MCP for Provider Parity
|
||||||
|
|
||||||
**Status:** Superseded — historical record
|
**Status:** Rejected — never implement
|
||||||
**Date:** 2026-04-09
|
**Date:** 2026-04-09
|
||||||
**Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md`
|
**Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md`
|
||||||
**Deciders:** Jeremy McSpadden
|
|
||||||
**Related:** ADR-004 (capability-aware model routing), ADR-007 (model catalog split and provider API encapsulation), `src/resources/extensions/sf/bootstrap/db-tools.ts`, `src/resources/extensions/claude-code-cli/stream-adapter.ts`, `packages/mcp-server/src/server.ts`
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
> Current boundary: SF does not expose its workflow as an MCP server for other
|
|
||||||
> clients. SF runs directly through `sf`/`/sf autonomous`; MCP is a client-side
|
|
||||||
> integration surface for external tools that SF may call. This ADR is preserved
|
|
||||||
> as historical context for a provider-parity idea that was not adopted.
|
|
||||||
|
|
||||||
SF currently has two different tool surfaces:
|
|
||||||
|
|
||||||
1. **In-process extension tools** registered directly into the runtime via `pi.registerTool(...)`.
|
|
||||||
2. **An external MCP server** that exposes session orchestration and read-only project inspection.
|
|
||||||
|
|
||||||
This split is now creating a real provider compatibility problem.
|
|
||||||
|
|
||||||
### What exists today
|
|
||||||
|
|
||||||
The core SF workflow tools are internal extension tools. Examples include:
|
|
||||||
|
|
||||||
- `sf_summary_save`
|
|
||||||
- `sf_plan_milestone`
|
|
||||||
- `sf_plan_slice`
|
|
||||||
- `sf_plan_task`
|
|
||||||
- `sf_task_complete`
|
|
||||||
- `sf_slice_complete`
|
|
||||||
- `sf_complete_milestone`
|
|
||||||
- `sf_validate_milestone`
|
|
||||||
- `sf_replan_slice`
|
|
||||||
- `sf_reassess_roadmap`
|
|
||||||
|
|
||||||
These are registered in `src/resources/extensions/sf/bootstrap/db-tools.ts` and related bootstrap files. SF prompts assume these tools are available during discuss, plan, and execute flows.
|
|
||||||
|
|
||||||
Separately, `packages/mcp-server/src/server.ts` exposes a different tool surface:
|
|
||||||
|
|
||||||
- session control: `sf_execute`, `sf_status`, `sf_result`, `sf_cancel`, `sf_query`, `sf_resolve_blocker`
|
|
||||||
- read-only inspection: `sf_progress`, `sf_roadmap`, `sf_history`, `sf_doctor`, `sf_captures`, `sf_knowledge`
|
|
||||||
|
|
||||||
That MCP server is useful, but it is **not** a transport for the internal workflow/mutation tools.
|
|
||||||
|
|
||||||
### The current failure mode
|
|
||||||
|
|
||||||
The Claude Code CLI provider uses the Anthropic Agent SDK through `src/resources/extensions/claude-code-cli/stream-adapter.ts`. That adapter starts a Claude SDK session, but it does not forward the internal SF tool registry into the SDK session, nor does it attach a SF MCP server for those tools.
|
|
||||||
|
|
||||||
As a result:
|
|
||||||
|
|
||||||
- prompts tell the model to call tools like `sf_task_complete`
|
|
||||||
- the tools exist in SF
|
|
||||||
- but Claude Code sessions do not actually receive those tools
|
|
||||||
|
|
||||||
This produces a contract mismatch: the model is required to use tools that are unavailable in that provider path.
|
|
||||||
|
|
||||||
### Why this matters
|
|
||||||
|
|
||||||
This is not a one-off Claude Code bug. It reveals a deeper architectural issue:
|
|
||||||
|
|
||||||
- SF’s core workflow contract is transport-specific
|
|
||||||
- prompt authors assume “internal extension tool availability”
|
|
||||||
- provider integrations do not all share the same execution surface
|
|
||||||
|
|
||||||
If SF wants provider parity, its workflow tools need a transport-neutral exposure model.
|
|
||||||
|
|
||||||
## Decision
|
## Decision
|
||||||
|
|
||||||
**Expose the SF workflow tool contract over MCP as a first-class transport, and make MCP the compatibility layer for providers that cannot directly access the in-process SF tool registry.**
|
Never build or restore an SF MCP server package. SF is not an MCP server product and must not become one.
|
||||||
|
|
||||||
This means:
|
Current SF uses MCP only as a client-side integration surface for external tools that SF may call. SF workflow execution stays inside the `sf` runtime through in-process extension tools, daemon/RPC/headless paths, and DB-backed workflow state.
|
||||||
|
|
||||||
1. SF will keep its existing in-process tool registration for native runtime use.
|
## Reason
|
||||||
2. SF will add an MCP execution surface for the same workflow tools.
|
|
||||||
3. Both surfaces must call the same underlying business logic.
|
|
||||||
4. Provider integrations such as Claude Code will use the MCP surface when they cannot access native in-process tools directly.
|
|
||||||
|
|
||||||
The decision is explicitly **not** to replace the native tool system with MCP everywhere. MCP is the parity and portability layer, not the only runtime path.
|
The original ADR proposed exposing SF workflow tools through a new MCP server for provider parity. That creates another mutable workflow transport and another state-safety surface. The current architecture avoids that extra server boundary: planning, execution, validation, and completion write structured state to SQLite first and render markdown/JSON as projections.
|
||||||
|
|
||||||
## Decision Details
|
## Operational Rule
|
||||||
|
|
||||||
### 1. One handler layer, multiple transports
|
- Never recreate `packages/mcp-server`.
|
||||||
|
- Never add a `src/mcp-server.ts` workflow backend.
|
||||||
|
- Never document SF as an MCP server provider.
|
||||||
|
- Never add provider-parity work that depends on SF exposing itself over MCP.
|
||||||
|
- Keep `src/resources/extensions/mcp-client/` as the client integration boundary.
|
||||||
|
|
||||||
SF tool behavior must not be implemented twice.
|
If a future integration needs external control of SF, use the daemon/RPC/headless contract. MCP remains for SF-as-client only.
|
||||||
|
|
||||||
The transport-neutral business logic for workflow tools should be shared by:
|
|
||||||
|
|
||||||
- native extension tool registration (`pi.registerTool(...)`)
|
|
||||||
- MCP server tool registration
|
|
||||||
|
|
||||||
The MCP server should wrap the same handlers used by `db-tools.ts`, `query-tools.ts`, and related modules. This avoids logic drift and keeps validation, DB writes, file rendering, and recovery behavior consistent.
|
|
||||||
|
|
||||||
### 2. Add a workflow-tool MCP surface
|
|
||||||
|
|
||||||
SF will expose the workflow tools required for discuss, planning, execution, and completion over MCP.
|
|
||||||
|
|
||||||
Initial minimum set:
|
|
||||||
|
|
||||||
- `sf_summary_save`
|
|
||||||
- `sf_decision_save`
|
|
||||||
- `sf_plan_milestone`
|
|
||||||
- `sf_plan_slice`
|
|
||||||
- `sf_plan_task`
|
|
||||||
- `sf_task_complete`
|
|
||||||
- `sf_slice_complete`
|
|
||||||
- `sf_complete_milestone`
|
|
||||||
- `sf_validate_milestone`
|
|
||||||
- `sf_replan_slice`
|
|
||||||
- `sf_reassess_roadmap`
|
|
||||||
- `sf_save_gate_result`
|
|
||||||
- selected read/query tools such as `sf_milestone_status`
|
|
||||||
|
|
||||||
Aliases should be treated conservatively. MCP should prefer canonical names unless compatibility requires exposing aliases.
|
|
||||||
|
|
||||||
### 3. Preserve safety semantics
|
|
||||||
|
|
||||||
The current SF safety model includes write gates, discussion gates, queue-mode restrictions, and state integrity guarantees.
|
|
||||||
|
|
||||||
Those guarantees must continue to apply when tools are invoked over MCP. In particular:
|
|
||||||
|
|
||||||
- MCP must not create a path that bypasses write gating
|
|
||||||
- MCP mutations must preserve the same DB/file/state invariants as native tools
|
|
||||||
- provider-specific fallback behavior must not allow manual summary writing in place of canonical completion tools
|
|
||||||
|
|
||||||
### 4. Make provider capability checks explicit
|
|
||||||
|
|
||||||
Before dispatching a workflow that requires SF workflow tools, SF should check whether the selected provider/session can access the required tool surface.
|
|
||||||
|
|
||||||
If a provider cannot access either:
|
|
||||||
|
|
||||||
- native in-process SF tools, or
|
|
||||||
- the SF MCP workflow tool surface
|
|
||||||
|
|
||||||
then SF must fail early with a clear compatibility error rather than allowing execution to continue in a degraded, state-breaking mode.
|
|
||||||
|
|
||||||
### 5. Keep the existing session/read MCP server
|
|
||||||
|
|
||||||
The existing MCP server in `packages/mcp-server` remains valid. It serves a different purpose:
|
|
||||||
|
|
||||||
- remote session orchestration
|
|
||||||
- status/result polling
|
|
||||||
- filesystem-backed project inspection
|
|
||||||
|
|
||||||
The new workflow-tool MCP surface is complementary, not a replacement.
|
|
||||||
|
|
||||||
## Alternatives Considered
|
|
||||||
|
|
||||||
### Alternative A: Reroute away from Claude Code whenever tool-backed execution is needed
|
|
||||||
|
|
||||||
This would fix the immediate failure for multi-provider users, but it does not solve provider parity. It also fails completely for users who only have Claude Code configured.
|
|
||||||
|
|
||||||
**Rejected** because it treats the symptom, not the architectural gap.
|
|
||||||
|
|
||||||
### Alternative B: Hard-fail Claude Code and require another provider
|
|
||||||
|
|
||||||
This is a valid short-term guardrail and may still be used before MCP support is complete.
|
|
||||||
|
|
||||||
**Rejected as the long-term architecture** because it permanently excludes a supported provider from first-class SF execution.
|
|
||||||
|
|
||||||
### Alternative C: Inject the internal SF tool registry directly into the Claude Agent SDK without MCP
|
|
||||||
|
|
||||||
This would tightly couple SF’s internal extension runtime to a provider-specific integration path. It would not generalize well to other providers or external tool clients.
|
|
||||||
|
|
||||||
**Rejected** because it creates a provider-specific bridge instead of a transport-neutral contract.
|
|
||||||
|
|
||||||
### Alternative D: Replace native SF tools entirely with MCP
|
|
||||||
|
|
||||||
This would simplify the conceptual model, but it would force all runtimes through an external protocol boundary even when the native in-process path is faster and already works well.
|
|
||||||
|
|
||||||
**Rejected** because MCP is needed for portability, not because the native tool system is flawed.
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
### Positive
|
|
||||||
|
|
||||||
1. **Provider parity improves.** Providers that can consume MCP tools can participate in full SF workflow execution.
|
|
||||||
2. **The workflow contract becomes transport-neutral.** Prompts can rely on capabilities rather than a specific runtime implementation detail.
|
|
||||||
3. **One compatibility story for external clients.** Claude Code, Cursor, and other MCP-capable clients can use the same workflow tool surface.
|
|
||||||
4. **Better long-term architecture.** Internal tools and external transports converge on shared handlers instead of diverging implementations.
|
|
||||||
|
|
||||||
### Negative
|
|
||||||
|
|
||||||
1. **Larger surface area to secure and test.** Mutation tools over MCP are higher risk than read-only inspection tools.
|
|
||||||
2. **Migration complexity.** Tool registration, gating, and handler extraction must be refactored carefully.
|
|
||||||
3. **Two transport paths must remain aligned.** Native and MCP invocation semantics must stay behaviorally identical.
|
|
||||||
|
|
||||||
### Neutral / Tradeoff
|
|
||||||
|
|
||||||
The system will now support:
|
|
||||||
|
|
||||||
- native in-process tool execution when available
|
|
||||||
- MCP-backed tool execution when native access is unavailable
|
|
||||||
|
|
||||||
That is more complex than a single-path system, but it is the cost of provider portability without sacrificing native runtime quality.
|
|
||||||
|
|
||||||
## Migration Plan
|
|
||||||
|
|
||||||
### Phase 1: Extract shared handlers
|
|
||||||
|
|
||||||
Refactor workflow tools so MCP and native registration can call the same transport-neutral functions.
|
|
||||||
|
|
||||||
Priority targets:
|
|
||||||
|
|
||||||
- `sf_summary_save`
|
|
||||||
- `sf_task_complete`
|
|
||||||
- `sf_plan_milestone`
|
|
||||||
- `sf_plan_slice`
|
|
||||||
- `sf_plan_task`
|
|
||||||
|
|
||||||
### Phase 2: Stand up the workflow-tool MCP server
|
|
||||||
|
|
||||||
Add a new MCP surface for workflow tool execution. This may extend the existing MCP package or live as a sibling package, but it must be clearly separated from the current session/read API.
|
|
||||||
|
|
||||||
### Phase 3: Port safety enforcement
|
|
||||||
|
|
||||||
Move or centralize write gates and related policy checks so MCP mutations cannot bypass the existing safety model.
|
|
||||||
|
|
||||||
### Phase 4: Attach MCP workflow tools to Claude Code sessions
|
|
||||||
|
|
||||||
Update the Claude Code provider integration to pass a SF-managed `mcpServers` configuration into the Claude Agent SDK session when required.
|
|
||||||
|
|
||||||
### Phase 5: Add provider capability gating
|
|
||||||
|
|
||||||
Before tool-dependent flows begin, verify that the active provider can access the required SF workflow tools via either native registration or MCP.
|
|
||||||
|
|
||||||
### Phase 6: Update prompts and docs
|
|
||||||
|
|
||||||
Prompt contracts should remain strict about using canonical SF completion/planning tools, but documentation and runtime messaging must no longer assume that only native in-process tool registration satisfies that contract.
|
|
||||||
|
|
||||||
## Validation
|
|
||||||
|
|
||||||
Success is defined by all of the following:
|
|
||||||
|
|
||||||
1. A Claude Code-backed execution session can complete a task using canonical SF workflow tools without manual summary writing.
|
|
||||||
2. Native provider behavior remains unchanged.
|
|
||||||
3. MCP-invoked workflow tools produce the same DB updates, rendered artifacts, and state transitions as native tool calls.
|
|
||||||
4. Write-gate and discussion-gate protections still hold under MCP invocation.
|
|
||||||
5. When required capabilities are unavailable, SF fails early with a precise compatibility error.
|
|
||||||
|
|
||||||
## Scope Notes
|
|
||||||
|
|
||||||
This ADR establishes the architectural direction. It does **not** require full MCP exposure of every historical alias or every auxiliary tool in the first implementation.
|
|
||||||
|
|
||||||
The first implementation should prioritize the minimum workflow tool set needed to make discuss/plan/execute/complete flows work safely for MCP-capable providers.
|
|
||||||
|
|
|
||||||
|
|
@ -9,7 +9,7 @@ sf today bundles its TUI directly in core: `pi-tui` (~10.5k LOC of TypeScript) i
|
||||||
|
|
||||||
Three forces argue for extracting the TUI:
|
Three forces argue for extracting the TUI:
|
||||||
|
|
||||||
1. **sf is becoming truly headless-first** — `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` already exist. CLI invocations talk to the daemon. sf can be called as an MCP backend by Claude Code, Cursor, Hermes — they're TUI-agnostic. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
|
1. **sf is becoming truly headless-first** — `packages/daemon` and `packages/rpc-client` already exist. CLI invocations talk to the daemon. SF uses MCP as a client integration surface for external tools, not as an SF workflow server. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
|
||||||
2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would.
|
2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would.
|
||||||
3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface.
|
3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface.
|
||||||
|
|
||||||
|
|
@ -42,7 +42,7 @@ This ADR plans the extraction.
|
||||||
|
|
||||||
- **sf core gets ~10k LOC leaner** after Stage 2.
|
- **sf core gets ~10k LOC leaner** after Stage 2.
|
||||||
- **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`).
|
- **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`).
|
||||||
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + MCP + clients. No TUI coupled to engine.
|
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + clients, with MCP client integration for external tools. No TUI coupled to engine.
|
||||||
- **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only.
|
- **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only.
|
||||||
- **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this.
|
- **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this.
|
||||||
|
|
||||||
|
|
@ -85,7 +85,7 @@ Total: ~12–16 weeks across stages.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` — already exist; this ADR makes them load-bearing for clients.
|
- `packages/daemon`, `packages/rpc-client` — already exist; this ADR makes them load-bearing for clients.
|
||||||
- `packages/pi-tui` — the existing TUI being deprecated.
|
- `packages/pi-tui` — the existing TUI being deprecated.
|
||||||
- `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate.
|
- `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate.
|
||||||
- `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally.
|
- `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally.
|
||||||
|
|
|
||||||
|
|
@ -36,7 +36,7 @@
|
||||||
| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
|
| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
|
||||||
| **LSP** | Language Server Protocol client and multiplexer |
|
| **LSP** | Language Server Protocol client and multiplexer |
|
||||||
| **Mac Tools** | macOS-native utilities (Swift CLI) |
|
| **Mac Tools** | macOS-native utilities (Swift CLI) |
|
||||||
| **MCP Server/Client** | Model Context Protocol server and client |
|
| **MCP Client** | Model Context Protocol client integration for external tools |
|
||||||
| **Memory Extension** | In-session memory pipeline and storage |
|
| **Memory Extension** | In-session memory pipeline and storage |
|
||||||
| **Migration** | Data and config migration tools |
|
| **Migration** | Data and config migration tools |
|
||||||
| **Modes** | Interactive TUI, Print, RPC, and Web modes |
|
| **Modes** | Interactive TUI, Print, RPC, and Web modes |
|
||||||
|
|
@ -87,7 +87,6 @@
|
||||||
| src/help-text.ts | CLI | Generates help text for all subcommands |
|
| src/help-text.ts | CLI | Generates help text for all subcommands |
|
||||||
| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
|
| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
|
||||||
| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
|
| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
|
||||||
| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
|
|
||||||
| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
|
| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
|
||||||
| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
|
| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
|
||||||
| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF |
|
| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF |
|
||||||
|
|
@ -388,9 +387,9 @@
|
||||||
|------|-----------------|-------------|
|
|------|-----------------|-------------|
|
||||||
| modes/index.ts | Modes | Mode system exports |
|
| modes/index.ts | Modes | Mode system exports |
|
||||||
| modes/print-mode.ts | Modes | Non-interactive print mode |
|
| modes/print-mode.ts | Modes | Non-interactive print mode |
|
||||||
| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
|
| modes/rpc/rpc-mode.ts | Modes, RPC | RPC server mode for remote access |
|
||||||
| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
|
| modes/rpc/rpc-client.ts | Modes, RPC | RPC client for remote agent interaction |
|
||||||
| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
|
| modes/rpc/rpc-types.ts | Modes, RPC | RPC protocol type definitions |
|
||||||
| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
|
| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
|
||||||
| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
|
| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
|
||||||
| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
|
| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
|
||||||
|
|
@ -610,7 +609,7 @@
|
||||||
| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
|
| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
|
||||||
| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
|
| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
|
||||||
| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
|
| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
|
||||||
| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
|
| mcp-client/index.ts | MCP Client | Model Context Protocol client integration |
|
||||||
| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
|
| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
|
||||||
| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
|
| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
|
||||||
| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
|
| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
|
||||||
|
|
@ -826,7 +825,7 @@
|
||||||
| File | System Label(s) | Description |
|
| File | System Label(s) | Description |
|
||||||
|------|-----------------|-------------|
|
|------|-----------------|-------------|
|
||||||
| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
|
| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
|
||||||
| vscode-extension/src/sf-client.ts | VS Code Extension, MCP Server/Client | RPC client for SF agent communication |
|
| vscode-extension/src/sf-client.ts | VS Code Extension, RPC | RPC client for SF agent communication |
|
||||||
| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command |
|
| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command |
|
||||||
| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
|
| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
|
||||||
|
|
||||||
|
|
@ -975,7 +974,7 @@ Quick lookup: which files are part of each system?
|
||||||
| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
|
| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
|
||||||
| **LSP** | pi-coding-agent/src/core/lsp/* |
|
| **LSP** | pi-coding-agent/src/core/lsp/* |
|
||||||
| **Mac Tools** | src/resources/extensions/mac-tools/* |
|
| **Mac Tools** | src/resources/extensions/mac-tools/* |
|
||||||
| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/sf-client.ts, modes/rpc/* |
|
| **MCP Client** | src/resources/extensions/mcp-client/index.ts |
|
||||||
| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
|
| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
|
||||||
| **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
|
| **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
|
||||||
| **Modes** | pi-coding-agent/src/modes/* |
|
| **Modes** | pi-coding-agent/src/modes/* |
|
||||||
|
|
|
||||||
|
|
@ -34,10 +34,10 @@ These were not clearly represented as durable roadmap items and should be planne
|
||||||
| Item | Why | Suggested tier | Implementation note |
|
| Item | Why | Suggested tier | Implementation note |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. |
|
| Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. |
|
||||||
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Start with file-family thresholds or focused test suites for dispatch, recovery, UOK runtime, and validation. |
|
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Started with focused DB-authority/UOK runtime suites; continue with dispatch and recovery families before changing global thresholds. |
|
||||||
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Tier 2 | Add a minimal lifecycle fixture that exercises DB rows as executable truth. |
|
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Done | Added runtime-state regression coverage proving SQLite slice/task order stays authoritative over stale markdown/JSON projections, and DB-backed runtime refuses implicit roadmap, plan, and summary imports. |
|
||||||
| Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. |
|
| Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. |
|
||||||
| MCP package completeness audit | Docs mention MCP surfaces, but production completeness is unclear. | Tier 2 | Inspect `packages/mcp-server/`, record supported contracts, gaps, and deferred work. |
|
| MCP server residue/docs cleanup | SF currently ships the MCP client extension only; tracked MCP server source was removed. | Done | Removed untracked `packages/mcp-server/` residue and updated durable docs so future work never recreates an SF MCP server. |
|
||||||
| Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. |
|
| Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. |
|
||||||
| Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. |
|
| Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. |
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -56,6 +56,36 @@ export interface HeadlessHeartbeatContext {
|
||||||
lastEventDetail?: string;
|
lastEventDetail?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export interface AssistantPreviewDelta {
|
||||||
|
kind: "text" | "thinking";
|
||||||
|
text: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Extract a concise preview delta from an assistant message update.
|
||||||
|
*
|
||||||
|
* Purpose: keep headless non-verbose output honest by separating assistant text
|
||||||
|
* from model thinking before both are flushed at tool starts and message end.
|
||||||
|
*
|
||||||
|
* Consumer: headless.ts streaming event loop and headless progress tests.
|
||||||
|
*/
|
||||||
|
export function extractAssistantPreviewDelta(
|
||||||
|
assistantMessageEvent: unknown,
|
||||||
|
): AssistantPreviewDelta | null {
|
||||||
|
if (!assistantMessageEvent || typeof assistantMessageEvent !== "object") {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
const event = assistantMessageEvent as Record<string, unknown>;
|
||||||
|
const type = String(event.type ?? "");
|
||||||
|
if (type !== "text_delta" && type !== "thinking_delta") return null;
|
||||||
|
const text = String(event.delta ?? event.text ?? "");
|
||||||
|
if (!text) return null;
|
||||||
|
return {
|
||||||
|
kind: type === "thinking_delta" ? "thinking" : "text",
|
||||||
|
text,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// ANSI Color Helpers
|
// ANSI Color Helpers
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
|
||||||
|
|
@ -60,6 +60,7 @@ import type { HeadlessJsonResult, OutputFormat } from "./headless-types.js";
|
||||||
import { VALID_OUTPUT_FORMATS } from "./headless-types.js";
|
import { VALID_OUTPUT_FORMATS } from "./headless-types.js";
|
||||||
import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js";
|
import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js";
|
||||||
import {
|
import {
|
||||||
|
extractAssistantPreviewDelta,
|
||||||
formatHeadlessHeartbeat,
|
formatHeadlessHeartbeat,
|
||||||
formatProgress,
|
formatProgress,
|
||||||
formatPromptTraceLines,
|
formatPromptTraceLines,
|
||||||
|
|
@ -1440,9 +1441,15 @@ async function runHeadlessOnce(
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Non-verbose: accumulate text_delta for truncated one-liner
|
// Non-verbose: accumulate separated thinking/text previews for
|
||||||
else if (ame?.type === "text_delta") {
|
// truncated one-liners before tool calls and message end.
|
||||||
assistantTextBuffer += String(ame.delta ?? ame.text ?? "");
|
else {
|
||||||
|
const previewDelta = extractAssistantPreviewDelta(ame);
|
||||||
|
if (previewDelta?.kind === "text") {
|
||||||
|
assistantTextBuffer += previewDelta.text;
|
||||||
|
} else if (previewDelta?.kind === "thinking") {
|
||||||
|
thinkingBuffer += previewDelta.text;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -41,7 +41,11 @@ function formatAggregateModelIdentity(modelId) {
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
/**
|
/**
|
||||||
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning.
|
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning,
|
||||||
|
* and also to model-learner for per-task-type performance tracking.
|
||||||
|
*
|
||||||
|
* Integration point for quick win #2: model-learner activates continuous
|
||||||
|
* per-task-type model performance tracking for adaptive routing decisions.
|
||||||
*/
|
*/
|
||||||
async function recordUnitOutcome(unit) {
|
async function recordUnitOutcome(unit) {
|
||||||
const db = getDatabase();
|
const db = getDatabase();
|
||||||
|
|
@ -54,7 +58,7 @@ async function recordUnitOutcome(unit) {
|
||||||
// drop bare-id entries rather than guessing the provider.
|
// drop bare-id entries rather than guessing the provider.
|
||||||
if (slashIdx === -1) return;
|
if (slashIdx === -1) return;
|
||||||
const provider = modelId.slice(0, slashIdx);
|
const provider = modelId.slice(0, slashIdx);
|
||||||
recordOutcome(db, {
|
const outcome = {
|
||||||
modelId,
|
modelId,
|
||||||
provider,
|
provider,
|
||||||
unitType: unit.type,
|
unitType: unit.type,
|
||||||
|
|
@ -68,7 +72,24 @@ async function recordUnitOutcome(unit) {
|
||||||
tokens_total: unit.tokens.total,
|
tokens_total: unit.tokens.total,
|
||||||
cost_usd: unit.cost,
|
cost_usd: unit.cost,
|
||||||
recorded_at: unit.startedAt,
|
recorded_at: unit.startedAt,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Record to UOK llm_task_outcomes table
|
||||||
|
recordOutcome(db, outcome);
|
||||||
|
|
||||||
|
// Quick Win #2: Also record to model-learner for per-task-type tracking
|
||||||
|
try {
|
||||||
|
const { ModelLearner } = await import("./model-learner.js");
|
||||||
|
const learner = new ModelLearner(basePath);
|
||||||
|
learner.recordOutcome(unit.type, modelId, {
|
||||||
|
success: true,
|
||||||
|
timeout: false,
|
||||||
|
tokensUsed: unit.tokens.total,
|
||||||
|
costUsd: unit.cost,
|
||||||
});
|
});
|
||||||
|
} catch {
|
||||||
|
/* model-learner integration is optional; never block outcome recording */
|
||||||
|
}
|
||||||
} catch {
|
} catch {
|
||||||
/* fire-and-forget */
|
/* fire-and-forget */
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -37,17 +37,12 @@ import {
|
||||||
getReplanHistory,
|
getReplanHistory,
|
||||||
getSlice,
|
getSlice,
|
||||||
getSliceTasks,
|
getSliceTasks,
|
||||||
insertMilestone,
|
|
||||||
insertSlice,
|
|
||||||
insertTask,
|
|
||||||
isDbAvailable,
|
isDbAvailable,
|
||||||
updateSliceStatus,
|
|
||||||
updateTaskStatus,
|
|
||||||
wasDbOpenAttempted,
|
wasDbOpenAttempted,
|
||||||
} from "./sf-db.js";
|
} from "./sf-db.js";
|
||||||
import { isClosedStatus, isDeferredStatus } from "./status-guards.js";
|
import { isClosedStatus, isDeferredStatus } from "./status-guards.js";
|
||||||
import { extractVerdict } from "./verdict-parser.js";
|
import { extractVerdict } from "./verdict-parser.js";
|
||||||
import { logError, logWarning } from "./workflow-logger.js";
|
import { logWarning } from "./workflow-logger.js";
|
||||||
/**
|
/**
|
||||||
* A "ghost" milestone directory contains only META.json (and no substantive
|
* A "ghost" milestone directory contains only META.json (and no substantive
|
||||||
* files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY). These appear when
|
* files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY). These appear when
|
||||||
|
|
@ -219,8 +214,7 @@ export async function getActiveMilestoneId(basePath) {
|
||||||
* STATE.md is a rendered cache of this output.
|
* STATE.md is a rendered cache of this output.
|
||||||
*
|
*
|
||||||
* When DB is available, queries milestone/slice/task tables directly.
|
* When DB is available, queries milestone/slice/task tables directly.
|
||||||
* Falls back to filesystem parsing for unmigrated projects or when DB
|
* Falls back to filesystem parsing only when DB is unavailable.
|
||||||
* has zero milestones (e.g. first run before migration).
|
|
||||||
*/
|
*/
|
||||||
export async function deriveState(basePath) {
|
export async function deriveState(basePath) {
|
||||||
// Return cached result if within the TTL window for the same basePath
|
// Return cached result if within the TTL window for the same basePath
|
||||||
|
|
@ -235,22 +229,6 @@ export async function deriveState(basePath) {
|
||||||
let result;
|
let result;
|
||||||
// Dual-path: try DB-backed derivation first when hierarchy tables are populated
|
// Dual-path: try DB-backed derivation first when hierarchy tables are populated
|
||||||
if (isDbAvailable()) {
|
if (isDbAvailable()) {
|
||||||
let dbMilestones = getAllMilestones();
|
|
||||||
// Disk→DB reconciliation when DB is empty but disk has milestones (#2631).
|
|
||||||
// deriveStateFromDb() does its own reconciliation, but deriveState() skips
|
|
||||||
// it entirely when the DB is empty. Sync here so the DB path is used when
|
|
||||||
// disk milestones exist but haven't been migrated yet.
|
|
||||||
if (dbMilestones.length === 0) {
|
|
||||||
const diskIds = findMilestoneIds(basePath);
|
|
||||||
let synced = false;
|
|
||||||
for (const diskId of diskIds) {
|
|
||||||
if (!isGhostMilestone(basePath, diskId)) {
|
|
||||||
insertMilestone({ id: diskId, status: "active" });
|
|
||||||
synced = true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (synced) dbMilestones = getAllMilestones();
|
|
||||||
}
|
|
||||||
const stopDbTimer = debugTime("derive-state-db");
|
const stopDbTimer = debugTime("derive-state-db");
|
||||||
result = await deriveStateFromDb(basePath);
|
result = await deriveStateFromDb(basePath);
|
||||||
stopDbTimer({
|
stopDbTimer({
|
||||||
|
|
@ -300,91 +278,29 @@ function extractContextTitle(content, fallback) {
|
||||||
const isStatusDone = isClosedStatus;
|
const isStatusDone = isClosedStatus;
|
||||||
/**
|
/**
|
||||||
* Derive SF state from the milestones/slices/tasks DB tables.
|
* Derive SF state from the milestones/slices/tasks DB tables.
|
||||||
* Flag files (PARKED, VALIDATION, CONTINUE, REPLAN, REPLAN-TRIGGER, CONTEXT-DRAFT)
|
* Non-planning control files (PARKED, CONTINUE, REPLAN, REPLAN-TRIGGER,
|
||||||
* are still checked on the filesystem since they aren't in DB tables.
|
* CONTEXT-DRAFT) are still checked on the filesystem since they are not
|
||||||
|
* hierarchy state.
|
||||||
* Requirements also stay file-based via parseRequirementCounts().
|
* Requirements also stay file-based via parseRequirementCounts().
|
||||||
*
|
*
|
||||||
* Must produce field-identical SFState to _deriveStateImpl() for the same project.
|
* Must not import rendered roadmap, plan, or summary artifacts into DB-backed
|
||||||
|
* runtime state. Explicit migration/repair flows own any legacy file import.
|
||||||
*/
|
*/
|
||||||
function reconcileDiskToDb(basePath) {
|
function reconcileDiskToDb(basePath) {
|
||||||
let allMilestones = getAllMilestones();
|
|
||||||
const dbIdSet = new Set(allMilestones.map((m) => m.id));
|
|
||||||
const diskIds = findMilestoneIds(basePath);
|
const diskIds = findMilestoneIds(basePath);
|
||||||
let synced = false;
|
if (diskIds.length > 0) {
|
||||||
for (const diskId of diskIds) {
|
const dbIds = new Set(getAllMilestones().map((m) => m.id));
|
||||||
if (!dbIdSet.has(diskId) && !isGhostMilestone(basePath, diskId)) {
|
const diskOnlyIds = diskIds.filter(
|
||||||
insertMilestone({ id: diskId, status: "active" });
|
(id) => !dbIds.has(id) && !isGhostMilestone(basePath, id),
|
||||||
synced = true;
|
);
|
||||||
}
|
if (diskOnlyIds.length > 0) {
|
||||||
}
|
|
||||||
if (synced) allMilestones = getAllMilestones();
|
|
||||||
for (const mid of diskIds) {
|
|
||||||
if (isGhostMilestone(basePath, mid)) continue;
|
|
||||||
const roadmapPath = resolveMilestoneFile(basePath, mid, "ROADMAP");
|
|
||||||
if (!roadmapPath) continue;
|
|
||||||
const dbSlices = getMilestoneSlices(mid);
|
|
||||||
const dbSliceIds = new Set(dbSlices.map((s) => s.id));
|
|
||||||
let roadmapContent;
|
|
||||||
try {
|
|
||||||
roadmapContent = readFileSync(roadmapPath, "utf-8");
|
|
||||||
} catch (err) {
|
|
||||||
logWarning(
|
logWarning(
|
||||||
"state",
|
"state",
|
||||||
"reconcileDiskToDb: roadmap read failed, skipping milestone",
|
`DB-backed state ignored ${diskOnlyIds.length} disk-only milestone(s): ${diskOnlyIds.join(", ")}`,
|
||||||
{
|
|
||||||
mid,
|
|
||||||
error: err.message,
|
|
||||||
},
|
|
||||||
);
|
);
|
||||||
continue;
|
|
||||||
}
|
|
||||||
const parsed = parseRoadmap(roadmapContent);
|
|
||||||
for (const s of parsed.slices) {
|
|
||||||
if (dbSliceIds.has(s.id)) continue;
|
|
||||||
const summaryPath = resolveSliceFile(basePath, mid, s.id, "SUMMARY");
|
|
||||||
const sliceStatus = s.done || summaryPath ? "complete" : "pending";
|
|
||||||
insertSlice({
|
|
||||||
id: s.id,
|
|
||||||
milestoneId: mid,
|
|
||||||
title: s.title,
|
|
||||||
status: sliceStatus,
|
|
||||||
risk: s.risk,
|
|
||||||
depends: s.depends,
|
|
||||||
demo: s.demo,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
// Reconcile stale *existing* slice rows (#3599): a slice row may exist in
|
|
||||||
// the DB with status "pending" even though disk artifacts (SUMMARY) prove
|
|
||||||
// completion — the same class of desync that task-level reconciliation
|
|
||||||
// (further below) already handles. Without this, the dependency resolver
|
|
||||||
// builds doneSliceIds from stale DB rows and downstream slices stay blocked
|
|
||||||
// forever with "No slice eligible".
|
|
||||||
for (const dbSlice of dbSlices) {
|
|
||||||
if (isStatusDone(dbSlice.status)) continue;
|
|
||||||
const summaryPath = resolveSliceFile(
|
|
||||||
basePath,
|
|
||||||
mid,
|
|
||||||
dbSlice.id,
|
|
||||||
"SUMMARY",
|
|
||||||
);
|
|
||||||
if (summaryPath) {
|
|
||||||
try {
|
|
||||||
updateSliceStatus(mid, dbSlice.id, "complete");
|
|
||||||
logWarning(
|
|
||||||
"reconcile",
|
|
||||||
`slice ${mid}/${dbSlice.id} status reconciled from "${dbSlice.status}" to "complete" (#3599)`,
|
|
||||||
{ mid, sid: dbSlice.id },
|
|
||||||
);
|
|
||||||
} catch (e) {
|
|
||||||
logError("reconcile", `failed to update slice ${dbSlice.id}`, {
|
|
||||||
sid: dbSlice.id,
|
|
||||||
error: e.message,
|
|
||||||
});
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
return getAllMilestones();
|
||||||
}
|
|
||||||
return allMilestones;
|
|
||||||
}
|
}
|
||||||
function buildCompletenessSet(basePath, milestones) {
|
function buildCompletenessSet(basePath, milestones) {
|
||||||
const completeMilestoneIds = new Set();
|
const completeMilestoneIds = new Set();
|
||||||
|
|
@ -727,47 +643,14 @@ function resolveSliceDependencies(activeMilestoneSlices) {
|
||||||
return { activeSlice: null, activeSliceRow: null };
|
return { activeSlice: null, activeSliceRow: null };
|
||||||
}
|
}
|
||||||
async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
|
async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
|
||||||
let tasks = getSliceTasks(milestoneId, sliceId);
|
const tasks = getSliceTasks(milestoneId, sliceId);
|
||||||
if (tasks.length === 0 && planFile) {
|
if (tasks.length === 0 && planFile) {
|
||||||
try {
|
|
||||||
const planContent = await loadFile(planFile);
|
|
||||||
if (planContent) {
|
|
||||||
const diskPlan = parsePlan(planContent);
|
|
||||||
if (diskPlan.tasks.length > 0) {
|
|
||||||
for (let i = 0; i < diskPlan.tasks.length; i++) {
|
|
||||||
const t = diskPlan.tasks[i];
|
|
||||||
try {
|
|
||||||
insertTask({
|
|
||||||
id: t.id,
|
|
||||||
sliceId,
|
|
||||||
milestoneId,
|
|
||||||
title: t.title,
|
|
||||||
status: t.done ? "complete" : "pending",
|
|
||||||
sequence: i + 1,
|
|
||||||
});
|
|
||||||
} catch (insertErr) {
|
|
||||||
logWarning(
|
logWarning(
|
||||||
"reconcile",
|
"reconcile",
|
||||||
`failed to insert task ${t.id} from plan file: ${insertErr instanceof Error ? insertErr.message : String(insertErr)}`,
|
`slice plan file exists for ${milestoneId}/${sliceId}, but DB has no task rows; refusing runtime import`,
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
tasks = getSliceTasks(milestoneId, sliceId);
|
|
||||||
logWarning(
|
|
||||||
"reconcile",
|
|
||||||
`imported ${tasks.length} tasks from plan file for ${milestoneId}/${sliceId} — DB was empty (#3600)`,
|
|
||||||
{ mid: milestoneId, sid: sliceId },
|
{ mid: milestoneId, sid: sliceId },
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
|
||||||
} catch (err) {
|
|
||||||
logError(
|
|
||||||
"reconcile",
|
|
||||||
`plan-file task import failed for ${milestoneId}/${sliceId}: ${err instanceof Error ? err.message : String(err)}`,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
let reconciled = false;
|
|
||||||
for (const t of tasks) {
|
for (const t of tasks) {
|
||||||
if (isStatusDone(t.status)) continue;
|
if (isStatusDone(t.status)) continue;
|
||||||
const summaryPath = resolveTaskFile(
|
const summaryPath = resolveTaskFile(
|
||||||
|
|
@ -778,40 +661,12 @@ async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
|
||||||
"SUMMARY",
|
"SUMMARY",
|
||||||
);
|
);
|
||||||
if (summaryPath && existsSync(summaryPath)) {
|
if (summaryPath && existsSync(summaryPath)) {
|
||||||
// Validate that the summary file has actual content (#sf-moobj36o-6rxy6e)
|
|
||||||
const summaryContent = readFileSync(summaryPath, "utf-8");
|
|
||||||
if (!isValidTaskSummary(summaryContent)) {
|
|
||||||
logWarning(
|
logWarning(
|
||||||
"reconcile",
|
"reconcile",
|
||||||
`task ${milestoneId}/${sliceId}/${t.id} has empty/invalid SUMMARY — skipping reconciliation`,
|
`task ${milestoneId}/${sliceId}/${t.id} has SUMMARY on disk but DB status is "${t.status}"; refusing runtime status import`,
|
||||||
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
||||||
);
|
);
|
||||||
continue;
|
|
||||||
}
|
}
|
||||||
try {
|
|
||||||
updateTaskStatus(
|
|
||||||
milestoneId,
|
|
||||||
sliceId,
|
|
||||||
t.id,
|
|
||||||
"complete",
|
|
||||||
new Date().toISOString(),
|
|
||||||
);
|
|
||||||
logWarning(
|
|
||||||
"reconcile",
|
|
||||||
`task ${milestoneId}/${sliceId}/${t.id} status reconciled from "${t.status}" to "complete" (#2514)`,
|
|
||||||
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
|
||||||
);
|
|
||||||
reconciled = true;
|
|
||||||
} catch (e) {
|
|
||||||
logError("reconcile", `failed to update task ${t.id}`, {
|
|
||||||
tid: t.id,
|
|
||||||
error: e.message,
|
|
||||||
});
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (reconciled) {
|
|
||||||
tasks = getSliceTasks(milestoneId, sliceId);
|
|
||||||
}
|
}
|
||||||
return tasks;
|
return tasks;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,241 @@
|
||||||
|
/**
|
||||||
|
* db-driven-runtime-state.test.mjs — DB authority for runtime state.
|
||||||
|
*
|
||||||
|
* Purpose: prove DB-backed projects choose active slices/tasks from structured
|
||||||
|
* SQLite rows even when generated markdown/JSON projections are stale.
|
||||||
|
*/
|
||||||
|
import assert from "node:assert/strict";
|
||||||
|
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs";
|
||||||
|
import { tmpdir } from "node:os";
|
||||||
|
import { join } from "node:path";
|
||||||
|
import { afterEach, test } from "vitest";
|
||||||
|
import {
|
||||||
|
closeDatabase,
|
||||||
|
getAllMilestones,
|
||||||
|
getSliceTasks,
|
||||||
|
insertMilestone,
|
||||||
|
insertSlice,
|
||||||
|
insertTask,
|
||||||
|
openDatabase,
|
||||||
|
} from "../sf-db.js";
|
||||||
|
import { deriveState, invalidateStateCache } from "../state.js";
|
||||||
|
|
||||||
|
const tmpDirs = [];
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
closeDatabase();
|
||||||
|
invalidateStateCache();
|
||||||
|
while (tmpDirs.length > 0) {
|
||||||
|
const dir = tmpDirs.pop();
|
||||||
|
if (dir) rmSync(dir, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
function makeProject() {
|
||||||
|
const dir = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||||
|
tmpDirs.push(dir);
|
||||||
|
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S01"), {
|
||||||
|
recursive: true,
|
||||||
|
});
|
||||||
|
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S02"), {
|
||||||
|
recursive: true,
|
||||||
|
});
|
||||||
|
openDatabase(join(dir, ".sf", "sf.db"));
|
||||||
|
insertMilestone({
|
||||||
|
id: "M777",
|
||||||
|
title: "DB runtime authority",
|
||||||
|
status: "active",
|
||||||
|
});
|
||||||
|
insertSlice({
|
||||||
|
milestoneId: "M777",
|
||||||
|
id: "S02",
|
||||||
|
title: "DB first slice",
|
||||||
|
status: "pending",
|
||||||
|
sequence: 1,
|
||||||
|
});
|
||||||
|
insertSlice({
|
||||||
|
milestoneId: "M777",
|
||||||
|
id: "S01",
|
||||||
|
title: "Stale file first slice",
|
||||||
|
status: "pending",
|
||||||
|
sequence: 2,
|
||||||
|
});
|
||||||
|
insertTask({
|
||||||
|
milestoneId: "M777",
|
||||||
|
sliceId: "S02",
|
||||||
|
id: "T02",
|
||||||
|
title: "DB first task",
|
||||||
|
status: "pending",
|
||||||
|
sequence: 1,
|
||||||
|
});
|
||||||
|
insertTask({
|
||||||
|
milestoneId: "M777",
|
||||||
|
sliceId: "S02",
|
||||||
|
id: "T01",
|
||||||
|
title: "DB second task",
|
||||||
|
status: "pending",
|
||||||
|
sequence: 2,
|
||||||
|
});
|
||||||
|
return dir;
|
||||||
|
}
|
||||||
|
|
||||||
|
test("deriveState_when_generated_projections_are_stale_uses_db_slice_and_task_sequence", async () => {
|
||||||
|
const project = makeProject();
|
||||||
|
const milestoneDir = join(project, ".sf", "milestones", "M777");
|
||||||
|
writeFileSync(
|
||||||
|
join(milestoneDir, "M777-ROADMAP.md"),
|
||||||
|
[
|
||||||
|
"# M777: stale generated roadmap",
|
||||||
|
"",
|
||||||
|
"## Slice Overview",
|
||||||
|
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||||
|
"|----|-------|------|---------|------|------------|",
|
||||||
|
"| S01 | Stale file first slice | low | - | | stale first |",
|
||||||
|
"| S02 | DB first slice | low | - | | should still be first by DB |",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
writeFileSync(
|
||||||
|
join(milestoneDir, "M777-ROADMAP.json"),
|
||||||
|
JSON.stringify(
|
||||||
|
{
|
||||||
|
origin: "stale-test-projection",
|
||||||
|
slices: [
|
||||||
|
{ id: "S01", title: "Stale file first slice", sequence: 1 },
|
||||||
|
{ id: "S02", title: "DB first slice", sequence: 2 },
|
||||||
|
],
|
||||||
|
},
|
||||||
|
null,
|
||||||
|
2,
|
||||||
|
),
|
||||||
|
);
|
||||||
|
writeFileSync(
|
||||||
|
join(milestoneDir, "slices", "S02", "S02-PLAN.md"),
|
||||||
|
[
|
||||||
|
"# S02: stale generated plan",
|
||||||
|
"",
|
||||||
|
"## Tasks",
|
||||||
|
"",
|
||||||
|
"- [ ] T99: stale task that should not replace DB rows",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
|
||||||
|
const state = await deriveState(project);
|
||||||
|
|
||||||
|
assert.equal(state.phase, "executing");
|
||||||
|
assert.equal(state.activeMilestone?.id, "M777");
|
||||||
|
assert.equal(state.activeSlice?.id, "S02");
|
||||||
|
assert.equal(state.activeTask?.id, "T02");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("deriveState_when_db_has_no_tasks_refuses_runtime_plan_file_import", async () => {
|
||||||
|
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||||
|
tmpDirs.push(project);
|
||||||
|
mkdirSync(join(project, ".sf", "milestones", "M778", "slices", "S01"), {
|
||||||
|
recursive: true,
|
||||||
|
});
|
||||||
|
openDatabase(join(project, ".sf", "sf.db"));
|
||||||
|
insertMilestone({
|
||||||
|
id: "M778",
|
||||||
|
title: "DB planning authority",
|
||||||
|
status: "active",
|
||||||
|
});
|
||||||
|
insertSlice({
|
||||||
|
milestoneId: "M778",
|
||||||
|
id: "S01",
|
||||||
|
title: "Plan exists without DB tasks",
|
||||||
|
status: "pending",
|
||||||
|
sequence: 1,
|
||||||
|
});
|
||||||
|
writeFileSync(
|
||||||
|
join(project, ".sf", "milestones", "M778", "M778-ROADMAP.md"),
|
||||||
|
[
|
||||||
|
"# M778: DB planning authority",
|
||||||
|
"",
|
||||||
|
"## Slice Overview",
|
||||||
|
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||||
|
"|----|-------|------|---------|------|------------|",
|
||||||
|
"| S01 | Plan exists without DB tasks | low | - | | keep planning |",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
writeFileSync(
|
||||||
|
join(project, ".sf", "milestones", "M778", "slices", "S01", "S01-PLAN.md"),
|
||||||
|
[
|
||||||
|
"# S01: stale generated plan",
|
||||||
|
"",
|
||||||
|
"## Tasks",
|
||||||
|
"",
|
||||||
|
"- [ ] T01: stale file task",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
|
||||||
|
const state = await deriveState(project);
|
||||||
|
|
||||||
|
assert.equal(state.phase, "planning");
|
||||||
|
assert.equal(state.activeMilestone?.id, "M778");
|
||||||
|
assert.equal(state.activeSlice?.id, "S01");
|
||||||
|
assert.equal(state.activeTask, null);
|
||||||
|
assert.equal(getSliceTasks("M778", "S01").length, 0);
|
||||||
|
assert.match(state.nextAction, /has a plan file but no tasks/i);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("deriveState_when_db_has_no_milestones_refuses_runtime_roadmap_import", async () => {
|
||||||
|
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||||
|
tmpDirs.push(project);
|
||||||
|
const milestoneDir = join(project, ".sf", "milestones", "M779");
|
||||||
|
mkdirSync(milestoneDir, { recursive: true });
|
||||||
|
openDatabase(join(project, ".sf", "sf.db"));
|
||||||
|
writeFileSync(
|
||||||
|
join(milestoneDir, "M779-ROADMAP.md"),
|
||||||
|
[
|
||||||
|
"# M779: stale disk-only roadmap",
|
||||||
|
"",
|
||||||
|
"## Slice Overview",
|
||||||
|
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||||
|
"|----|-------|------|---------|------|------------|",
|
||||||
|
"| S01 | Should not import | low | - | | no DB mutation |",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
|
||||||
|
const state = await deriveState(project);
|
||||||
|
|
||||||
|
assert.equal(state.phase, "pre-planning");
|
||||||
|
assert.equal(state.activeMilestone, null);
|
||||||
|
assert.deepEqual(getAllMilestones(), []);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("deriveState_when_task_summary_exists_keeps_db_task_status_authoritative", async () => {
|
||||||
|
const project = makeProject();
|
||||||
|
const taskDir = join(
|
||||||
|
project,
|
||||||
|
".sf",
|
||||||
|
"milestones",
|
||||||
|
"M777",
|
||||||
|
"slices",
|
||||||
|
"S02",
|
||||||
|
"tasks",
|
||||||
|
);
|
||||||
|
mkdirSync(taskDir, { recursive: true });
|
||||||
|
writeFileSync(
|
||||||
|
join(taskDir, "T02-SUMMARY.md"),
|
||||||
|
[
|
||||||
|
"# T02: stale summary projection",
|
||||||
|
"",
|
||||||
|
"This file is not DB state.",
|
||||||
|
"",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
|
||||||
|
const state = await deriveState(project);
|
||||||
|
const [firstTask] = getSliceTasks("M777", "S02");
|
||||||
|
|
||||||
|
assert.equal(state.phase, "executing");
|
||||||
|
assert.equal(state.activeSlice?.id, "S02");
|
||||||
|
assert.equal(state.activeTask?.id, "T02");
|
||||||
|
assert.equal(firstTask.id, "T02");
|
||||||
|
assert.equal(firstTask.status, "pending");
|
||||||
|
});
|
||||||
|
|
@ -184,13 +184,16 @@ export function parseTriageReport(agentOutput) {
|
||||||
* Idempotent: skips rows whose ID already appears in the file.
|
* Idempotent: skips rows whose ID already appears in the file.
|
||||||
* 2. Call markResolved for each resolution in the report.
|
* 2. Call markResolved for each resolution in the report.
|
||||||
* Idempotent: markResolved itself skips already-resolved entries.
|
* Idempotent: markResolved itself skips already-resolved entries.
|
||||||
|
* 3. Quick Win #1: Auto-fix high-confidence self-reports where confidence > 0.85.
|
||||||
*
|
*
|
||||||
* @param basePath - project root (directory containing .sf/)
|
* @param basePath - project root (directory containing .sf/)
|
||||||
* @param report - parsed TriageReport from parseTriageReport()
|
* @param report - parsed TriageReport from parseTriageReport()
|
||||||
*/
|
*/
|
||||||
export function applyTriageReport(basePath, report) {
|
export async function applyTriageReport(basePath, report) {
|
||||||
let requirementsAdded = 0;
|
let requirementsAdded = 0;
|
||||||
let entriesResolved = 0;
|
let entriesResolved = 0;
|
||||||
|
let reportsAutoFixed = 0;
|
||||||
|
|
||||||
// ── 1. Write promoted requirements ────────────────────────────────────────
|
// ── 1. Write promoted requirements ────────────────────────────────────────
|
||||||
if (report.promotedRequirements.length > 0) {
|
if (report.promotedRequirements.length > 0) {
|
||||||
const sfDir = sfRoot(basePath);
|
const sfDir = sfRoot(basePath);
|
||||||
|
|
@ -244,6 +247,7 @@ export function applyTriageReport(basePath, report) {
|
||||||
}
|
}
|
||||||
writeFileSync(reqPath, content, "utf-8");
|
writeFileSync(reqPath, content, "utf-8");
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── 2. Resolve entries ────────────────────────────────────────────────────
|
// ── 2. Resolve entries ────────────────────────────────────────────────────
|
||||||
for (const resolution of report.resolutions) {
|
for (const resolution of report.resolutions) {
|
||||||
const evidenceKind = resolution.evidenceKind;
|
const evidenceKind = resolution.evidenceKind;
|
||||||
|
|
@ -258,5 +262,26 @@ export function applyTriageReport(basePath, report) {
|
||||||
);
|
);
|
||||||
if (mutated) entriesResolved++;
|
if (mutated) entriesResolved++;
|
||||||
}
|
}
|
||||||
return { requirementsAdded, entriesResolved };
|
|
||||||
|
// ── 3. Quick Win #1: Auto-fix high-confidence self-reports ────────────────
|
||||||
|
// Integration point for self-report-fixer: read open reports and auto-apply
|
||||||
|
// fixes where confidence > 0.85.
|
||||||
|
try {
|
||||||
|
const { autoFixHighConfidenceReports } = await import(
|
||||||
|
"./self-report-fixer.js"
|
||||||
|
);
|
||||||
|
const allOpen = [
|
||||||
|
...readAllSelfFeedback(basePath),
|
||||||
|
...readUpstreamSelfFeedback(),
|
||||||
|
].filter((e) => !e.resolvedAt);
|
||||||
|
|
||||||
|
if (allOpen.length > 0) {
|
||||||
|
const result = await autoFixHighConfidenceReports(basePath, allOpen);
|
||||||
|
reportsAutoFixed = result.applied.length;
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
/* self-report fixer is optional; never block triage report application */
|
||||||
|
}
|
||||||
|
|
||||||
|
return { requirementsAdded, entriesResolved, reportsAutoFixed };
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -2,6 +2,7 @@ import assert from "node:assert/strict";
|
||||||
import { describe, it } from "vitest";
|
import { describe, it } from "vitest";
|
||||||
import type { ProgressContext } from "../headless-ui.js";
|
import type { ProgressContext } from "../headless-ui.js";
|
||||||
import {
|
import {
|
||||||
|
extractAssistantPreviewDelta,
|
||||||
formatCostLine,
|
formatCostLine,
|
||||||
formatHeadlessHeartbeat,
|
formatHeadlessHeartbeat,
|
||||||
formatProgress,
|
formatProgress,
|
||||||
|
|
@ -556,6 +557,31 @@ describe("formatThinkingLine", () => {
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe("extractAssistantPreviewDelta", () => {
|
||||||
|
it("separates assistant text deltas from thinking deltas", () => {
|
||||||
|
assert.deepEqual(
|
||||||
|
extractAssistantPreviewDelta({
|
||||||
|
type: "text_delta",
|
||||||
|
delta: "I will edit the file.",
|
||||||
|
}),
|
||||||
|
{ kind: "text", text: "I will edit the file." },
|
||||||
|
);
|
||||||
|
assert.deepEqual(
|
||||||
|
extractAssistantPreviewDelta({
|
||||||
|
type: "thinking_delta",
|
||||||
|
delta: "Need inspect first.",
|
||||||
|
}),
|
||||||
|
{ kind: "thinking", text: "Need inspect first." },
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("ignores non-preview assistant events", () => {
|
||||||
|
assert.equal(extractAssistantPreviewDelta({ type: "text_start" }), null);
|
||||||
|
assert.equal(extractAssistantPreviewDelta({ type: "thinking_end" }), null);
|
||||||
|
assert.equal(extractAssistantPreviewDelta(null), null);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
describe("formatCostLine", () => {
|
describe("formatCostLine", () => {
|
||||||
it("formats cost with token count", () => {
|
it("formats cost with token count", () => {
|
||||||
const result = formatCostLine(0.0523, 4200, 1100);
|
const result = formatCostLine(0.0523, 4200, 1100);
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue