docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first

Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence): 9 items ranked security → bug-fixes → infra → features. Critical: 1. HTML export escape (security) 2. Empty tools array fix (provider compatibility) 3. Anthropic SSE proxy event tolerance 4. Long local-LLM SSE 5min timeout fix Infrastructure: 5. Bedrock inference profile normalization 6. Symlinked packages dedup 7. ctx.ui.setWorkingVisible() extension API Features: 8. Cloudflare Workers AI provider 9. Azure Cognitive Services endpoint Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace): Critical fixes (11): 1-6. bash race, security hardening, web_search injection narrowing, symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock 7-10. agent_end transition fixes (4 commits) 11. claude-code-cli Always-Allow persistence Normal-value features (6): 12. /gsd eval-review slim port (prompt + tool + template) 13. Workflow state machine hardening (5 commits as unit) 14. Proactive rate limiting (min_request_interval_ms) 15. Per-call token telemetry (opt-in pi-coding-agent hooks) 16. Worktree TUI commands 17. Doctor check for orphan milestone directories Skipped from each upstream is documented. All in BUILD_PLAN.md so sf can work the list systematically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:04:31 +02:00 · 2026-04-29 14:04:31 +02:00 · a3c487c918
commit a3c487c918
parent bb1c68b7ab
1 changed files with 80 additions and 2 deletions
--- a/BUILD_PLAN.md
+++ b/BUILD_PLAN.md
@ -6,16 +6,94 @@ This document is the answer to: **what should we actually ship for v3?**

 ---

-## Active follow-ups (added during 2026-04-29 session)
+## Tier 0 — Pi-mono ports (sf: do these FIRST)
+
+Pi-mono (`badlogic/pi-mono`) has shipped 4 releases (v0.70.3 → v0.70.6) since our last vendor sync. These should be picked up before other v3 work because:
+
+- They're security/correctness fixes for code we already use.
+- They land cleanly (no namespace divergence — `packages/pi-*` were vendored from pi-mono with same paths and type names).
+- Skipping them means dragging known bugs into v3 work.
+
+Order: **security first → real bugs → infra → features**.
+
+| Order | Pi-mono fix | Why | Reference (pi-mono SHA) |
+|---|---|---|---|
+| 1 | **HTML export: escape image data + session metadata** | Security — crafted session content could inject markup in exported HTML | PRs #3819, #3883 (in v0.70.6) |
+| 2 | **Empty `tools` array fix for providers that reject** | Correctness bug — some providers reject the call | PR #3650 (in v0.70.3) |
+| 3 | **Anthropic SSE: ignore unknown proxy events** | Correctness bug — proxies emit OpenAI-style `done` events that crash our parser | issue #3708 (in v0.70.3) |
+| 4 | **Long local-LLM SSE timeout (5-min undici cutoff)** | Correctness bug — local Ollama / LM Studio sessions over 5 min die with `UND_ERR_BODY_TIMEOUT` | issue #3715 (in v0.70.3) |
+| 5 | **Bedrock inference profile normalization** | Bedrock prompt-caching + adaptive-thinking checks fail on inference profile ARNs | PR #3527 (in v0.70.3) |
+| 6 | **Symlinked packages/resources/skills/sessions dedup** | Selectors and loaders show duplicates when paths are symlinked | PR #3818 (in v0.70.3) |
+| 7 | **`ctx.ui.setWorkingVisible()` extension API** | Lets extensions hide the built-in working-loader row; useful for autopilot UX | issue #3674 (in v0.70.3) |
+| 8 | **Cloudflare Workers AI provider** | New provider option (`CLOUDFLARE_API_KEY`/`CLOUDFLARE_ACCOUNT_ID`) | PR #3851 (in v0.70.6) |
+| 9 | **Azure Cognitive Services endpoint** | Azure OpenAI Responses base URL support | PR #3799 (in v0.70.3) |
+
+**Process for each:** read the pi-mono commit, port the fix to our `packages/pi-*` (cherry-pick should work cleanly here — same namespace as upstream); commit with `port(pi-mono): <description> (refs <pi-mono SHA>)` style.
+
+**Skip from pi-mono** (not applicable to us):
+- `pi update --self`, `pi.dev` update endpoint, Windows self-update — we vendor; no pi-binary auto-update path
+- Bun startup / sandbox `/proc/self/environ` fixes — we run on Node, not Bun
+- Packaged session selector import — our dist layout differs
+
+---
+
+## Tier 0.5 — gsd-2 high-value manual ports (after Tier 0)
+
+`gsd-build/gsd-2` has 4,589 commits we're missing. Cherry-pick **fails** on virtually all of them because of our namespace divergence (`gsd_*` → `sf_*` rename, `extensions/gsd/` → `extensions/sf/` rename, prior pi-mono direct cherry-picks). These have to be **manually ported** — read the commit, write equivalent code against our paths and naming.
+
+Process for each:
+1. Read the commit at `gsd-build/gsd-2` (we have it as `upstream/main`).
+2. Find the equivalent file(s) in our `extensions/sf/` tree.
+3. Apply the fix manually with `gsd_*` → `sf_*` and `.gsd/` → `.sf/` translations.
+4. Commit with `port(gsd-2): <description> (refs <gsd-2 SHA>)` style.
+
+**Critical fixes worth porting** (limit to security + correctness; skip parallel-evolution churn):
+
+| Order | gsd-2 fix | Why | gsd-2 SHA |
+|---|---|---|---|
+| 1 | **`fix(safety): persist bash evidence at tool_call` (close mid-unit re-dispatch race)** | Real race condition; bash tool calls can lose evidence between dispatch and re-dispatch | `da7dd56e7` (PR #5056 → #5058) |
+| 2 | **`fix(security): harden project-controlled surfaces`** | We have a partial cherry-pick at `66ff949c1`; supersede with the full fix | `65ca5aa2e` |
+| 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
+| 4 | **`fix(gsd): self-heal symlinked .gsd staging`** | Data-loss prevention — symlinked staging dir was being treated as the wrong scope | `9340f1e9b` (#4423) |
+| 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
+| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` |
+| 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
+| 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
+| 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
+| 10 | **`Fix session transition during agent_end`** | Companion to the above | `e3bd04551` |
+| 11 | **`fix(claude-code-cli): persist Always Allow for non-Bash tools`** | Always-Allow grants didn't persist for non-Bash tools | `a88baeae9` (PR #5096) |
+
+**Normal-value features worth porting** (not critical, but real):
+
+| Order | gsd-2 feature | Why | Effort | gsd-2 SHA(s) |
+|---|---|---|---|---|
+| 12 | **`/gsd eval-review` (slim, like product-audit)** | New milestone-end evaluation review command + frontmatter schema. We don't have it. Slim port pattern: prompt + tool + workflow template; skip parallel rewrites of dispatch/prompts. | 2 hrs | `979487735` `6971f4333` `a2f8f0e08` `83bcb054c` `a686d22cb` (+11 polish commits) |
+| 13 | **Workflow state machine hardening (5 commits as a unit)** | `harden workflow state transitions`, `persist workflow retry and summary state`, `fail closed on unreadable milestone summaries`, `restore slice dependency fallback`. Reliability of long auto runs. | 2 hrs | `f2377eedd` `b9a1c6743` `153fb328a` `381ccdef5` `371b2eb31` (PR #4758) |
+| 14 | **Proactive rate limiting via `min_request_interval_ms`** | Self-throttle to avoid 429s — model-side rate-limit data is observability-only (per SPEC.md §19.6); this is the per-dispatch knob. | 1 hr | `f980929f1` `73bc4d2f1` (PR #5007) |
+| 15 | **Per-call token telemetry (opt-in)** | pi-coding-agent gains opt-in per-call token telemetry hooks. Useful for cost dashboards. | 0.5 hr | `b4d4725ad` (PR #5023) |
+| 16 | **Worktree TUI commands (`worktree {list,merge,clean,remove}`)** | Adds these to the TUI dispatcher. We may have parts of this; check before porting. | 1 hr | `2361ceeb1` (PR #5055) |
+| 17 | **Doctor check for orphan milestone directories** | Diagnostic — flags `.sf/active/` artifacts whose milestones are gone. Aligns with SPEC.md C-24 startup cleanup. | 0.5 hr | `420354f99` (PR #4998) |
+
+**Skip from gsd-2** (parallel evolution; we have own implementations):
+- `auto-dispatch.ts`, `auto-prompts.ts`, `benchmark-selector.ts` rewrites — we have these and ours are richer (e.g. our benchmark-selector has more eval types).
+- UnitContextManifest / Composer rewrite (~15 commits, PRs #4782 / #4924 / #4925 / #4926) — major architectural refactor that conflicts heavily; revisit during v3 §3 schema reconciliation.
+- xiaomi/minimax/product-audit features — already ported in commits `ae0bbe32f`, `2eebeccb9`, `a8cf2cd94`.
+- All headless UX, prompt edits (DeepWiki/Context7), Serena hints, and global MCP loading — already addressed in our session (commits `c41912ff5`, `dff0df5fd`); we have own equivalents.
+
+**See `UPSTREAM_CHERRY_PICK_CANDIDATES.md`** for the full audit (all 4,589 commits surveyed; this Tier 0.5 list is the 17 worth porting — 11 critical + 6 normal value).
+
+---
+
+## Tier 1+ active follow-ups (after Tier 0 lands)

 These came up during recent ports and refactor passes — tracked here so they don't get lost.

 | Follow-up | Why | Tier | Effort |
 |---|---|---|---|
-| **Search provider registry refactor** | Adding minimax took 9 files because the provider list is duplicated across `provider.ts` (type + VALID_PREFERENCES), `native-search.ts`, `command-search-provider.ts` (CLI), `tool-search.ts` + `tool-llm-context.ts` (two separate execute paths!), `preferences-types.ts`, `preferences-validation.ts`, manifest, docs. A single `SearchProviderRegistry` array would let everything iterate. | 2 | 3-5 days |
 | **Minimax search tests** | Search agent ported the feature but explicitly skipped tests because bunker's tests don't match our preferences/provider export shape. Need: `getMiniMaxSearchApiKey()` priority order, `resolveSearchProvider()` returning "minimax", `/search-provider minimax` CLI behavior, no-key error messages, `executeMiniMaxSearch` request shape. | 1 | 0.5 day |
 | **Product-audit phase machine wire-up** | Slim port (commit `a8cf2cd94`) shipped the prompt + `sf_product_audit` tool + workflow template, but doesn't yet dispatch into PhaseMerge or PhaseComplete. The tool is callable; the phase doesn't auto-fire. | 2 | 0.5 day |
 | **Headless assistant-text preview** | Headless UX commit (`dff0df5fd`) covered notification spam, categorization, and phase/status tag distinction. The fourth bunker improvement — separating `assistantTextBuffer` from `thinkingBuffer` and flushing both as concise previews on tool-execution-start / message-end — was deferred because it's a meatier change in `headless.ts`. | 2 | 0.5 day |
+| **Search provider registry refactor** | Adding minimax took 9 files because the provider list is duplicated across `provider.ts` (type + VALID_PREFERENCES), `native-search.ts`, `command-search-provider.ts` (CLI), `tool-search.ts` + `tool-llm-context.ts` (two separate execute paths!), `preferences-types.ts`, `preferences-validation.ts`, manifest, docs. A single `SearchProviderRegistry` array would let everything iterate. | 2 | 3-5 days |
 | **Pi-mono SDK sync** | We pull from pi-mono directly (separate from gsd-2 sync stance). Periodically check `pi-mono/main` for SDK improvements worth taking. The remote is set up; cadence is not. | 3 | recurring |

 It is opinionated. Each item has a tier and a one-line rationale. Reorder freely.