When message_end does not fire before agent_end (e.g. abort path), the
agent_end case was calling chatContainer.removeChild(streamingComponent),
which silently erased the last assistant message from the TUI chat history.
Fix: follow the message_end finalization pattern — call setShowMetadata(true)
and updateContent() before clearing the reference. Never call removeChild on
a component that was added to the persistent chat history.
Closes#4197
Two bugs were causing version drift across the repo:
1. Root package.json was silently reverted from 2.74.0 → 2.73.1 during
commit b03c9401c (a CI optimization rebase). Tag v2.74.0 is already
published on npm, so the next release would have computed 2.73.2 —
lower than what's already out — and shipped a broken version.
2. scripts/bump-version.mjs only touches pi-coding-agent + pkg + native
platform shims. Other workspace packages drift independently:
- @gsd-build/mcp-server: stuck at 2.52.0 (22 minor versions behind)
- @gsd-build/rpc-client: stuck at 2.52.0
- @gsd/pi-ai, pi-tui, pi-agent-core: stuck at 0.57.1
- @gsd/native, @gsd-build/daemon: stuck at 0.1.0
Changes:
- Bump all non-private workspace packages to 2.74.0 to match the latest
release tag. Update daemon + mcp-server's internal rpc-client dep
from ^2.52.0 → ^2.74.0. Regenerate root lockfile.
- scripts/generate-changelog.mjs: compute newVersion from max(latest
stable tag, package.json) instead of package.json alone. Prevents
version regressions when package.json is accidentally clobbered by
rebases or merges.
- scripts/bump-version.mjs: extend to sync all eight non-private
workspace packages (daemon, mcp-server, native, pi-agent-core, pi-ai,
pi-coding-agent, pi-tui, rpc-client) including their internal deps
on each other. Private packages (studio, web) are left alone.
Studio and web remain on their own versioning (private: true, never
published). The native platform shims under native/npm/* are still
synced via native/scripts/sync-platform-versions.cjs from the root
version as before.
Ports the v1 graphify system to v2 as a native TypeScript implementation.
The knowledge graph builds semantic relationships between milestones, slices,
tasks, and knowledge entries — and injects relevant subgraphs automatically
into every agent dispatch prompt.
## Core implementation (packages/mcp-server/src/readers/graph.ts)
- `buildGraph(projectDir)` — walks all .gsd/ artifacts (STATE.md,
milestone PLANs, slice PLANs, KNOWLEDGE.md), extracts nodes and edges
with confidence tiers (EXTRACTED / INFERRED / AMBIGUOUS). Parse errors
skip the node rather than crashing.
- `writeGraph(gsdRoot, graph)` — atomic write via tmp file + rename.
- `writeSnapshot(gsdRoot)` — saves a diff baseline before each rebuild.
- `graphQuery(projectDir, term, budget?)` — BFS subgraph search with
case-insensitive matching on label + description; trims AMBIGUOUS edges
first, then INFERRED, respecting the token budget (default 4 000).
- `graphStatus(projectDir)` — freshness check; stale = older than 24 h.
- `graphDiff(projectDir)` — compares current graph to last snapshot,
returns added / removed / changed counts for nodes and edges.
## MCP tool (packages/mcp-server/src/server.ts)
Registers `gsd_graph` immediately after `gsd_knowledge` with four modes:
build | query | status | diff. All errors returned as isError: true.
## CLI subcommand (src/cli.ts, src/help-text.ts)
`gsd graph build|status|query <term>|diff` — follows the established
`if (cliFlags.messages[0] === '...')` dispatch pattern. Uses
`resolveGsdRoot()` for git-root-aware path resolution (not a naive
`.gsd` append). Help text updated with correct positional argument format.
## Auto-rebuild after slice completion
(src/resources/extensions/gsd/tools/complete-slice.ts)
Fire-and-forget `buildGraph → writeGraph` triggered after every slice
completion. Uses `@gsd-build/mcp-server` package import (not a relative
src path) and `resolveGsdRoot()` for correct path resolution in monorepos.
## Graph-aware dispatch injection
(src/resources/extensions/gsd/graph-context.ts,
src/resources/extensions/gsd/auto-prompts.ts)
`inlineGraphSubgraph(projectDir, term, { budget })` queries the graph and
formats the result as a `### Knowledge Graph Context` markdown block,
consistent with all other inlined context blocks. Adds a stale warning
annotation when the graph is older than 24 h. Returns null (graceful
skip) when graph.json is missing, the query returns zero nodes, or the
import fails — no agent dispatch is ever blocked by graph availability.
Injected into three prompt builders:
- `buildResearchSlicePrompt` — 3 000 token budget
- `buildPlanSlicePrompt` — 3 000 token budget
- `buildExecuteTaskPrompt` — 2 000 token budget
## Tests
- 22 tests for the core graph reader (graph.test.ts)
- 14 tests for the dispatch injection helper (graph-context.test.ts)
- All tests use real on-disk fixtures (no module mocking needed)
- Full suite: 6 318 passed, 0 failed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DynamicBorder: verify lastExternalRender tracking suppresses redundant
renders during streaming, and standalone renders fire when idle
- TUI clearOnShrink: verify debounce flag lifecycle — deferred shrink
preserves maxLinesRendered, flag resets when content grows back
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rebuildChatFromMessages() called populatePinnedFromMessages() which
re-populated the pinned zone with text already present in the chat
history, causing visible duplication during session state changes.
Additionally, the spinner interval at 80ms generated ~12.5 renders/s
for a purely cosmetic animation, and clearOnShrink triggered
unnecessary full redraws during pinned-zone transitions.
- Remove populatePinnedFromMessages() from rebuildChatFromMessages()
and add pinnedMessageContainer.clear() instead — the streaming
lifecycle in chat-controller manages pinned content during active work
- Reduce spinner interval 80ms→200ms with render-batching that skips
redundant renders when streaming already triggers requestRender()
- Debounce clearOnShrink: defer full redraw by one render tick so
pinned-clear→new-streaming transitions avoid a wasted full redraw
- Increase notification widget safety-net timer 5s→30s since the
store subscription already handles push-based updates
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The two new sub-turn shrink regression tests created a pinned
DynamicBorder (via message_update with pinnable text + tool) but never
emitted message_end, so the spinner's setInterval kept the test process
alive until CI timed out after 15 minutes. Append a message_end to
each test so the module-level pinnedBorder is torn down.
Commit c8c416802 (#4144) introduced module-level renderedSegments state
to track interleaved text/tool components per assistant turn, but never
reset it when an adapter shrinks streamingMessage.content[] back to 0/1
at a provider sub-turn boundary within one assistant lifecycle (the
claude-code adapter does this). Consequence chain: the segment walker
finds the stale text-run entry at startIndex=0, calls updateContent on
it with the new (shrunk) message, and the in-place edit destroys the
prior sub-turn's visible text. New tool blocks at contentIndex=1 then
collide with stale registrations, causing visual ordering corruption.
hasToolsInTurn stays sticky-true and lastPinnedText never clears, so
the pinned "Working - Latest Output" mirror freezes on the pre-shrink
snapshot.
Track lastContentLength explicitly. On shrink, clear renderedSegments,
reset lastPinnedText, and reset lastProcessedContentIndex so the
walker treats the new sub-turn as fresh segments that append after
prior sub-turn children. Prior history stays rendered as frozen
components; pendingTools and the spinner border are untouched.
Adds two regression tests in chat-controller-ordering.test.ts: one
verifies prior sub-turn components are not overwritten and new tools
append in content[] order after a shrink, the other verifies the
pinned markdown updates from the first sub-turn's text to the second
sub-turn's text across a shrink boundary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
InteractiveMode.renderWidgets() called Container.clear() on the
widgetContainerAbove/Below render mounts, which disposed every mounted
extension widget and then re-added the now-dead components. In AUTO mode
updateProgressWidget re-registers gsd-progress on every unit dispatch,
so gsd-notifications and gsd-health had their refresh timers and store
subscriptions killed after the first dispatch. Renders kept returning
the widgets' frozen cachedLines, making them look alive but never update
(/gsd notifications clear appeared to do nothing, belowEditor last-commit
went stale while the top-of-screen dashboard stayed correct).
Split detach from dispose: add Container.detachChildren() and use it from
the two widget-mount call sites. clear() still disposes for every other
caller (chat, editor, status, pinned-message containers). The
extensionWidgets* maps remain the single owner of widget disposal via
removeExisting() and clearExtensionWidgets().
While in AUTO, gsd-progress duplicates gsd-health on last commit, cost/
budget, and the health signal. Make gsd-progress the single source of
truth: hide gsd-health from auto-start and re-register it from every
exit point in auto.ts (lock-lost stop, cleanupAfterLoopExit !paused
guard, stopAuto, pauseAuto). gsd-notifications stays visible — it is
independent state and, with the detach fix, its subscription + 5s
refresh actually work again.
Tests: Container.detachChildren()/clear() contract guards added to
packages/pi-tui/src/__tests__/tui.test.ts. health-widget,
notification-{store,widget,overlay}, notifications-handler, notifications,
and auto-paused-ui-cleanup suites all pass.
Previously the chat-controller created one AssistantMessageComponent per
assistant message and removed/re-appended it to the chat container's tail
on every tool block, forcing all narration after every tool execution
regardless of stream order. Users had to scroll up to read text that was
written before each tool call.
Replace the reorder hack with a stream-order segment walker that walks
content[] left-to-right, collapses contiguous text/thinking blocks into
text-run segments, emits one segment per tool block, and append-only adds
new segments to chatContainer. AssistantMessageComponent gains a
ContentRange API so a single message can spawn multiple text-run
components, plus a separate showMetadata flag so timestamp/error footers
render only on the trailing segment without duplicating earlier text.
Adds a regression test that streams [text, tool, text, tool, text] and
asserts both interleaved order and per-segment rendered text content.
Closes#4144
Restore the isProviderRequestReady() guard lost during the main merge.
Tests in model-resolver.test.ts and model-resolver-initial-model-auth.test.ts
require findInitialModel() to skip an unauth'd saved default and fall
through to the first available model.
Remove hard-coded Anthropic/Claude defaults and silent provider swaps so
the app honors whatever model/provider the user has configured.
- src/cli.ts: drop the anthropic->claude-code auto-migration blocks that
were rewriting the user's saved defaultProvider on every startup.
- packages/pi-coding-agent/src/core/model-resolver.ts: delete the
defaultModelPerProvider table, drop the "recommended variant" swap
that silently upgraded e.g. claude-opus-4-6 to -extended, and replace
the provider-iteration first-available fallback with provider-sticky
(user's saved provider first, then first registry entry).
- src/startup-model-validation.ts: replace the openai/anthropic-first
fallback chain with Pi-default -> same-provider -> first-available.
- src/help-text.ts: use a generic provider/model-id example for --model
instead of claude-opus-4-6.
- src/tests/startup-model-validation.test.ts: update the fallback test
to assert provider stickiness rather than a specific Claude model id.
https://claude.ai/code/session_01CvuUuzuVjRcQN25263nG6V
Extract the post-tool text-block selection logic into a small pure
helper (`findLatestPinnableText`) so the regression scenario can be
covered without standing up the full interactive controller harness.
The new test pins the bug from #4120: when content blocks are
`[text1, tool1, text2_streaming]`, the helper must return `text1`
(not `text2`), because `text2` is still streaming live into the chat
container and mirroring it would render the same tokens twice.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pinned `Working · Latest Output` border above the editor mirrors
the assistant's latest text block while tools run, so prose stays
visible after a tool's output scrolls it off-screen. The mirror walked
content blocks from the end and picked the last text block — but when
the assistant streams a *new* text block after a tool call (sequence
`[text1, tool1, text2_streaming]`), it picked `text2`, which was also
being streamed live into the chat container. Result: identical tokens
rendered in two places at once.
Restrict the search to text blocks whose index is strictly less than
the index of the most recent tool call. Text after the last tool call
stays in the chat container only; earlier prose (e.g. `text1`) remains
mirrored the entire time the new text streams, so context isn't lost
and the loading-animation handoff is undisturbed.
Fixes#4120
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Custom OpenAI-compatible providers running on localhost (e.g. a local proxy)
with an explicit apiKey in models.json received 'local-no-key-needed' during
compaction instead of their configured key, causing 401 errors.
The localhost shortcut in AuthStorage.getApiKey() was unconditional. Normal
dispatch calls getApiKeyForProvider() which skips the baseUrl check entirely,
so the fallback resolver was reached and the real key was used. Compaction
calls getApiKey(model) which passes baseUrl, hitting the shortcut first.
Closes#4106
* chore(pi-ai): regenerate model registry from upstream APIs
Regenerated models.generated.ts by running generate-models.ts against
live provider APIs. Last generated: 2026-04-09.
+48 models added, 19 removed across all providers.
Notable additions: z-ai/glm-5.1 via OpenRouter (closes#4069,
supersedes custom entry in #4055), zai-org/GLM-5.1, z-ai/glm-5v-turbo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(pi-ai): add structural and regression tests for models.generated.ts
- Regression #3582: pins qwen/qwen3.6-plus in openrouter
- Regression #4069: pins z-ai/glm-5.1 in openrouter
- Structural invariants across all 23 providers / all models
- Registry shape: exact provider list, model count lower bound
- Removed models guard: decommissioned models must stay absent
- Spot-checks for notable models added in this regeneration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(pi-ai): add Alibaba DashScope as standalone provider
Adds `alibaba-dashscope` for users with a regular DashScope API key,
separate from the existing `alibaba-coding-plan` free-tier provider.
- types.ts: register `alibaba-dashscope` as KnownProvider
- env-api-keys.ts: map to DASHSCOPE_API_KEY
- models.custom.ts: add qwen3-max, qwen3.5-plus, qwen3.5-flash,
qwen3-coder-plus with international endpoint and real pricing
- model-resolver.ts: default model qwen3.5-plus
- key-manager.ts: add alibaba-coding-plan and alibaba-dashscope
to PROVIDER_REGISTRY so /gsd keys add works for both
Co-Authored-By: Claude Code <noreply@anthropic.com>
* feat(pi-ai): add qwen3.6-plus to alibaba-dashscope provider
qwen3.6-plus is available on DashScope international endpoint.
Pricing: $0.5/M input, $3/M output (base tier, 0-256K tokens).
Supports thinking mode (reasoning: true).
Source: https://www.alibabacloud.com/help/en/model-studio/model-pricing
Co-Authored-By: Claude Code <noreply@anthropic.com>
* test(pi-ai): add tests for alibaba-dashscope provider and key-manager regression
- packages/pi-ai/src/models.test.ts: add describe block covering all 5
alibaba-dashscope models (presence, base URL, API, provider field,
context window, paid pricing, per-model reasoning/cost assertions,
independence from alibaba-coding-plan, failure path for unknown model)
- src/resources/extensions/gsd/tests/key-manager.test.ts: add regression
tests for #3891 — alibaba-coding-plan was missing from PROVIDER_REGISTRY,
causing /gsd keys add alibaba-coding-plan to fail silently; also covers
alibaba-dashscope registration, env var separation, and getAllKeyStatuses
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Code <noreply@anthropic.com>
Extension-based providers like pi-claude-cli register their models
during extension loading, but registrations were queued and not flushed
until after model resolution ran. This caused findInitialModel() and
the startup model validation to see extension models as nonexistent,
permanently overwriting the user's saved model selection on every launch.
- Flush pendingProviderRegistrations in createAgentSession() before
findInitialModel() so extension models are visible in the registry
- Move model validation to after createAgentSession() in both print
and interactive code paths
- Load extensions before --list-models so extension models appear
Strip the `mcp__<server>__` prefix from tool_use blocks emitted by the
Claude Agent SDK so registered GSD extension renderers (gsd_plan_milestone,
gsd_task_complete, etc.) match instead of falling through to the generic
JSON-dump fallback. The original server name is preserved on the toolCall
block under `mcpServer` for downstream rendering.
Tighten the generic ToolExecutionComponent fallback for any remaining
prefixed names (third-party MCP servers): show a muted `server·tool`
title, render primitive args as compact `key=value` pairs, and truncate
output to 10 lines when collapsed.
Every workflow turn that needed a quality gate either let it drop
silently or bulk-stamped it at closeout. Q8 was the worst case: seeded
as scope:"slice" by plan-slice, treated as a blocker for the
evaluating-gates phase by state.ts, then filtered out of the
gate-evaluate prompt via `if (!meta) continue;` and never closed by
complete-slice — a guaranteed auto-loop stall once slice gates were
enabled.
Introduce gate-registry.ts as the single source of truth for which
turn owns which gate (Q3/Q4 → gate-evaluate, Q5/Q6/Q7 → execute-task,
Q8 → complete-slice, MV01–MV04 → validate-milestone). Every layer of
the prompt system now consults it:
- state.ts derives pending counts by owner turn, not scope, so Q8
never stalls evaluating-gates again.
- auto-prompts.ts builders call assertGateCoverage() and render a
"Gates to Close" block from the registry instead of a hand-rolled
GATE_QUESTIONS table.
- complete-slice and complete-task handlers saveGateResult for every
gate they own, mapping gate id → params field so empty sections
become `omitted` and populated sections become `pass`.
- milestone-validation-gates sources its MV id list from the registry.
- prompt-validation.ts adds validateSliceSummaryOutput /
validateTaskSummaryOutput / validateMilestoneValidationOutput
schema checks.
- gsd_save_gate_result accepts MV01–MV04 (via the registry keys) in
the MCP server and bootstrap tool registration.
Tests: new gate-registry + prompt-system-gate-coverage +
complete-slice-gate-closure suites, plus a Q8 regression case in
gate-dispatch.test.ts. 161 related tests pass end-to-end.
https://claude.ai/code/session_019PT3EmrkMxr4TsgGGLSYK3