- Change phase:"blocked" from stopAuto to pauseAuto — sessions are now
resumable instead of requiring manual /gsd auto restart
- Default reassess_after_slice to true — reassessment fires after every
slice completion unless explicitly disabled (was opt-in, causing missed
reassessments in multi-slice milestones)
- Change dispatch no-match fallthrough from level:"info" (hard stop) to
level:"warning" (pause) — unhandled phases are now recoverable
- Add dependency-resolution fallback in resolveSliceDependencies — when
no slice has ALL deps satisfied, picks the one with the most deps met
instead of immediately returning blocked (both DB and file-based paths)
Tracked output from 2022 (commit d93956ba4) that's missing the modern
GSD_HOME env support and webPreferencesPath export present in the .ts
source. No runtime path consumes it, but the test compile script's
copyAssets step overlays src/* onto esbuild output in dist-test, so the
stale .js was shadowing the compiled app-paths and breaking any unit
test transitively importing webPreferencesPath.
Cover the canonical parseCliArgs export in cli-web-branch.ts including
the new mcp mode, worktree flag (boolean and named forms), and existing
short flags, web mode flags, list flags, and positional message handling.
Also remove src/app-paths.js — a stale tracked output (last touched in
2022, missing GSD_HOME and webPreferencesPath exports). The test compile
script copies all of src/ over esbuild's output, so this stale .js was
shadowing the compiled app-paths in dist-test and breaking any test that
transitively imported it. No runtime path uses it (production loads from
dist/app-paths.js; jiti/tsx prefer the .ts source).
Satisfies require-tests.sh on PR #4162.
Pure deletion/deduplication pass on top-level src/*.ts. External behavior
unchanged; all targeted unit tests still pass.
cli.ts (−170 net lines)
- Adopt canonical validateConfiguredModel from startup-model-validation.ts;
delete the drifted local copy with hardcoded model fallbacks.
- Import CliFlags + parseCliArgs from cli-web-branch.ts instead of keeping
a second, 90%-identical parser; pass cliFlags directly into
runWebCliBranch instead of re-parsing process.argv.
- Extract 3 helpers for verbatim duplicates:
* printNonTtyErrorAndExit (TTY gate, 2 call sites)
* printExtensionErrors (extension load errors, 2 call sites)
* reapplyValidatedModelOnFallback (post-createAgentSession fix, 2 sites)
- Factor runHeadlessFromAuto helper shared by the `gsd auto` shorthand
and the auto-piped-stdout redirect.
- Collapse ensureRtkBootstrap from hand-rolled _done flag to a
promise-memoized doRtkBootstrap.
- Drop redundant validateConfiguredModel pre-createAgentSession calls
(the post-createAgentSession call is the correct one per #2626).
- Delete dead --version/-v and --help/-h fast paths (loader.ts already
handles these before cli.ts is imported).
cli-web-branch.ts
- Unify CliFlags with worktree, 'mcp' mode, and _selectedSessionPath.
- Drop unused help?/version? flags (loader.ts intercepts them).
onboarding.ts
- Add runStep<T>() helper with shared cancel/warn handling; collapse 4
near-identical try/catch blocks around runLlmStep, runWebSearchStep,
runRemoteQuestionsStep, runToolKeysStep.
- Delete trivial isCancelError helper (inlined as p.isCancel).
- Rewrite loadPico() adapter to build PicoModule from chalk so we can
drop the redundant picocolors dependency.
package.json / package-lock.json
- Remove picocolors direct dep (chalk remains the single color library).
The two new sub-turn shrink regression tests created a pinned
DynamicBorder (via message_update with pinnable text + tool) but never
emitted message_end, so the spinner's setInterval kept the test process
alive until CI timed out after 15 minutes. Append a message_end to
each test so the module-level pinnedBorder is torn down.
Commit c8c416802 (#4144) introduced module-level renderedSegments state
to track interleaved text/tool components per assistant turn, but never
reset it when an adapter shrinks streamingMessage.content[] back to 0/1
at a provider sub-turn boundary within one assistant lifecycle (the
claude-code adapter does this). Consequence chain: the segment walker
finds the stale text-run entry at startIndex=0, calls updateContent on
it with the new (shrunk) message, and the in-place edit destroys the
prior sub-turn's visible text. New tool blocks at contentIndex=1 then
collide with stale registrations, causing visual ordering corruption.
hasToolsInTurn stays sticky-true and lastPinnedText never clears, so
the pinned "Working - Latest Output" mirror freezes on the pre-shrink
snapshot.
Track lastContentLength explicitly. On shrink, clear renderedSegments,
reset lastPinnedText, and reset lastProcessedContentIndex so the
walker treats the new sub-turn as fresh segments that append after
prior sub-turn children. Prior history stays rendered as frozen
components; pendingTools and the spinner border are untouched.
Adds two regression tests in chat-controller-ordering.test.ts: one
verifies prior sub-turn components are not overwritten and new tools
append in content[] order after a shrink, the other verifies the
pinned markdown updates from the first sub-turn's text to the second
sub-turn's text across a shrink boundary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reconcileSliceTasks called updateTaskStatus without a completedAt
timestamp, leaving tasks.completed_at NULL for all tasks completed
via the file-existence reconcile path.
Closes#4129
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
InteractiveMode.renderWidgets() called Container.clear() on the
widgetContainerAbove/Below render mounts, which disposed every mounted
extension widget and then re-added the now-dead components. In AUTO mode
updateProgressWidget re-registers gsd-progress on every unit dispatch,
so gsd-notifications and gsd-health had their refresh timers and store
subscriptions killed after the first dispatch. Renders kept returning
the widgets' frozen cachedLines, making them look alive but never update
(/gsd notifications clear appeared to do nothing, belowEditor last-commit
went stale while the top-of-screen dashboard stayed correct).
Split detach from dispose: add Container.detachChildren() and use it from
the two widget-mount call sites. clear() still disposes for every other
caller (chat, editor, status, pinned-message containers). The
extensionWidgets* maps remain the single owner of widget disposal via
removeExisting() and clearExtensionWidgets().
While in AUTO, gsd-progress duplicates gsd-health on last commit, cost/
budget, and the health signal. Make gsd-progress the single source of
truth: hide gsd-health from auto-start and re-register it from every
exit point in auto.ts (lock-lost stop, cleanupAfterLoopExit !paused
guard, stopAuto, pauseAuto). gsd-notifications stays visible — it is
independent state and, with the detach fix, its subscription + 5s
refresh actually work again.
Tests: Container.detachChildren()/clear() contract guards added to
packages/pi-tui/src/__tests__/tui.test.ts. health-widget,
notification-{store,widget,overlay}, notifications-handler, notifications,
and auto-paused-ui-cleanup suites all pass.
When GSD is installed with `bun add -g`, running `gsd update` or
`/gsd update` previously shelled out to `npm install -g`, which fails
with EACCES on systems where npm has no write access to the global
node_modules directory.
Adds `resolveInstallCommand(pkg)` to `update-check.ts` that returns
`bun add -g <pkg>` when `process.versions.bun` is defined (i.e. the
current runtime is Bun), and `npm install -g <pkg>` otherwise. All
three update paths — `update-cmd.ts`, `commands-handlers.ts`, and the
interactive startup prompt in `update-check.ts` — now use this helper,
including the fallback error message shown to the user.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the chat-controller created one AssistantMessageComponent per
assistant message and removed/re-appended it to the chat container's tail
on every tool block, forcing all narration after every tool execution
regardless of stream order. Users had to scroll up to read text that was
written before each tool call.
Replace the reorder hack with a stream-order segment walker that walks
content[] left-to-right, collapses contiguous text/thinking blocks into
text-run segments, emits one segment per tool block, and append-only adds
new segments to chatContainer. AssistantMessageComponent gains a
ContentRange API so a single message can spawn multiple text-run
components, plus a separate showMetadata flag so timestamp/error footers
render only on the trailing segment without duplicating earlier text.
Adds a regression test that streams [text, tool, text, tool, text] and
asserts both interleaved order and per-segment rendered text content.
Closes#4144
CI on #4141 failed because threading an explicit flatRateCtx parameter
through resolvePreferredModelConfig broke two contracts the test suite
locks in:
1. interactive-routing-bypass (#3962) asserts that
resolvePreferredModelConfig is invoked with exactly three positional
arguments and that its `if (!isAutoMode) return undefined` guard
lives within the first 600 chars of the function body. The new
flatRateCtx param + JSDoc pushed the guard past that window and
lengthened the call site.
2. silent-catch-diagnostics (#3348) requires migrated files to route
through workflow-logger instead of leaving empty catch blocks. The
new buildFlatRateContext() swallowed registry lookup errors with a
comment-only catch.
Fix both without regressing flat-rate detection:
- Hang the flat-rate context off autoModeStartModel itself via an
optional `flatRateCtx` field. selectAndApplyModel now enriches
autoModeStartModel up front (preserving the variable name) and
resolvePreferredModelConfig reads autoModeStartModel.flatRateCtx —
signature shrinks back to three params, call site returns to the
3-arg form the test anchors on.
- Replace the empty catch in buildFlatRateContext() with a
logWarning("dispatch", ...) that surfaces the lookup failure while
still falling through with authMode undefined, matching the
fail-closed policy everywhere else in the file.
The 3-entry hard-coded FLAT_RATE_PROVIDERS set in auto-model-selection.ts
treated only github-copilot/copilot/claude-code as flat-rate, so dynamic
routing would happily downgrade units on user-registered subscription
proxies and any externalCli CLI wrapper — quality loss with no cost
benefit for users whose provider charges a flat rate per request.
Make isFlatRateProvider extensible by composing three signals:
1. Built-in list (unchanged, wins first for regression safety).
2. externalCli auto-detection via ctx.modelRegistry.getProviderAuthMode()
— any CLI wrapper around the user's subscription is inherently
flat-rate.
3. User-declared `flat_rate_providers` preference for private
subscription-backed proxies, enterprise-gated deployments, and custom
CLI wrappers the built-in list doesn't know about.
Add a buildFlatRateContext() helper so every call site constructs the
context the same way and degrades gracefully when ctx/prefs/registry are
unavailable (never breaks flat-rate detection).
Thread the context through:
- resolvePreferredModelConfig (routing synthesis guard)
- selectAndApplyModel primary-model and fallback provider checks
- auto-start.ts dynamic-routing banner so the startup message matches
dispatch-time reality
Preferences:
- Add `flat_rate_providers?: string[]` to GSDPreferences and
KNOWN_PREFERENCE_KEYS in preferences-types.ts.
- Add a string-array validator in preferences-validation.ts that trims
whitespace and drops empty entries.
Tests:
- Extend flat-rate-routing-guard.test.ts with 13 new cases covering
externalCli auto-detection, userFlatRate preference matching
(case-insensitive), combined signals, buildFlatRateContext() behavior
(including registry-lookup-throws and non-canonical auth-mode
responses), plus regression cases for the built-in list.
- Add 5 validator cases in preferences.test.ts for the new
flat_rate_providers field (string-array accepted, whitespace trimmed,
non-array rejected, non-string elements rejected, known-key warning
check).
When a user picks a custom-provider model via /gsd model (Ollama, vLLM,
LM Studio, OpenAI-compatible proxies — anything defined in
~/.gsd/agent/models.json) and then runs /gsd auto, the bootstrap silently
swaps it out for whichever model PREFERENCES.md happens to list. That
model is invariably a built-in provider (claude-code, anthropic) the user
isn't logged into, so auto-mode immediately fails with
"Not logged in · Please run /login", pauses, and resets the session to
claude-code/claude-sonnet-4-6.
Root cause: #3517 made resolveDefaultSessionModel() (PREFERENCES.md) take
priority over ctx.model (settings.json) in auto-start.ts. That fix was
correct for the scenario where settings.json had a stale built-in default
but PREFERENCES.md was freshly configured, but it has no awareness of
custom providers — PREFERENCES.md cannot reference them, so honoring it
when the session provider is custom always discards the user's explicit
choice.
Add isCustomProvider() to preferences-models.ts which checks whether a
provider is declared in ~/.gsd/agent/models.json (with ~/.pi/agent
fallback). Read the file directly with JSON.parse to avoid pulling in
the model-registry at this call site, and treat any read or parse error
as not-custom so a malformed models.json never breaks bootstrap.
In bootstrapAutoSession(), when the session provider is custom, use
ctx.model directly. Otherwise fall through to the existing #3517
behavior (preferredModel ?? ctx.model).
Tests:
- New behavioral regression in model-isolation.test.ts that mirrors
the auto-start.ts logic and verifies the four interesting cases:
custom session beats PREFERENCES.md, built-in session still defers
to PREFERENCES.md (#3517 preserved), custom session with no
PREFERENCES.md uses ctx.model, and null ctx.model falls through.
- New string-grep guard in auto-start-model-capture.test.ts that the
isCustomProvider() call is wired into the snapshot path.
- Updated #3517 grep to allow the new branching shape while still
asserting preferredModel remains a snapshot source for built-ins.
https://claude.ai/code/session_01QLYCeiXWjSFPEXFxjkSLni
* fix(ci): address 5 pipeline integrity issues from release audit
- version-stamp.mjs: regenerate package-lock.json after dev version stamp
(mirrors the same fix applied to bump-version.mjs in #4116)
- bump-version.mjs: regenerate root and web/package-lock.json after version
bump so both lockfiles are always in sync at release time
- pipeline.yml: add post-bump validation step that verifies all package.json
files parse as valid JSON before the release commit is made
- pipeline.yml: split "Commit, tag, and push" — commit+tag+rebase happen
before build, but git push is deferred until after build and npm publish
both succeed, preventing a broken tag from landing on main
- pipeline.yml: emit a :⚠️: annotation when live LLM tests fail so
failures are visible in the Actions UI instead of silently swallowed
Closes#4118
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gsd): address 3 silent-crash secondary issues from #3348 post-#3696
Three gaps that remained after the double-fault fix in #3696:
1. unhandledRejection not wired — installEpipeGuard only registered
uncaughtException; promise rejections that escaped without a catch
were not handled by the GSD error path. Added _gsdRejectionGuard
alongside _gsdEpipeGuard.
2. Non-fatal overcorrection — the #3696 fix replaced re-throwing with
log-and-continue, leaving the process running in an indeterminate
state after any non-EPIPE/non-ENOENT exception. Replaced with
writeCrashLog + process.exit(1). writeCrashLog is extracted into
bootstrap/crash-log.ts (zero deps) so tests can import it without
pulling in the full extension graph.
3. unit-end not emitted after crash-with-side-effects — hameltomor
observed that complete-milestone M001 wrote SUMMARY.md and updated
the DB but never emitted unit-end (#3348 comment-4237533440). Added
emitCrashRecoveredUnitEnd() in crash-recovery.ts: on the next
auto-mode startup, if a stale lock references a unit whose
unit-start has no matching unit-end in the journal, a synthetic
unit-end with status "crash-recovered" is emitted before the lock
is cleared. This closes the causal chain for downstream tooling
and forensics without requiring changes to the lock file schema.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Restore the isProviderRequestReady() guard lost during the main merge.
Tests in model-resolver.test.ts and model-resolver-initial-model-auth.test.ts
require findInitialModel() to skip an unauth'd saved default and fall
through to the first available model.
Remove hard-coded Anthropic/Claude defaults and silent provider swaps so
the app honors whatever model/provider the user has configured.
- src/cli.ts: drop the anthropic->claude-code auto-migration blocks that
were rewriting the user's saved defaultProvider on every startup.
- packages/pi-coding-agent/src/core/model-resolver.ts: delete the
defaultModelPerProvider table, drop the "recommended variant" swap
that silently upgraded e.g. claude-opus-4-6 to -extended, and replace
the provider-iteration first-available fallback with provider-sticky
(user's saved provider first, then first registry entry).
- src/startup-model-validation.ts: replace the openai/anthropic-first
fallback chain with Pi-default -> same-provider -> first-available.
- src/help-text.ts: use a generic provider/model-id example for --model
instead of claude-opus-4-6.
- src/tests/startup-model-validation.test.ts: update the fallback test
to assert provider stickiness rather than a specific Claude model id.
https://claude.ai/code/session_01CvuUuzuVjRcQN25263nG6V
Extract the post-tool text-block selection logic into a small pure
helper (`findLatestPinnableText`) so the regression scenario can be
covered without standing up the full interactive controller harness.
The new test pins the bug from #4120: when content blocks are
`[text1, tool1, text2_streaming]`, the helper must return `text1`
(not `text2`), because `text2` is still streaming live into the chat
container and mirroring it would render the same tokens twice.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pinned `Working · Latest Output` border above the editor mirrors
the assistant's latest text block while tools run, so prose stays
visible after a tool's output scrolls it off-screen. The mirror walked
content blocks from the end and picked the last text block — but when
the assistant streams a *new* text block after a tool call (sequence
`[text1, tool1, text2_streaming]`), it picked `text2`, which was also
being streamed live into the chat container. Result: identical tokens
rendered in two places at once.
Restrict the search to text blocks whose index is strictly less than
the index of the most recent tool call. Text after the last tool call
stays in the chat container only; earlier prose (e.g. `text1`) remains
mirrored the entire time the new text streams, so context isn't lost
and the loading-animation handoff is undisturbed.
Fixes#4120
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reactive_execution.subagent_model was validated and stored but never
passed to the prompt builders that generate subagent dispatch instructions.
The executing agent therefore autonomously chose its default model instead
of the configured preference.
- buildReactiveExecutePrompt: add subagentModel? param, inject into
instruction string; auto-dispatch passes reactiveConfig.subagent_model
with fallback to resolveModelWithFallbacksForUnit("subagent")
- buildParallelResearchSlicesPrompt: same pattern, resolves from
models.subagent preference
- buildGateEvaluatePrompt: same pattern
- system-context: inject configured subagent model into system prompt
so the executing agent always knows which model to use for subagents
Closes#4078
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- version-stamp.mjs: regenerate package-lock.json after dev version stamp
(mirrors the same fix applied to bump-version.mjs in #4116)
- bump-version.mjs: regenerate root and web/package-lock.json after version
bump so both lockfiles are always in sync at release time
- pipeline.yml: add post-bump validation step that verifies all package.json
files parse as valid JSON before the release commit is made
- pipeline.yml: split "Commit, tag, and push" — commit+tag+rebase happen
before build, but git push is deferred until after build and npm publish
both succeed, preventing a broken tag from landing on main
- pipeline.yml: emit a :⚠️: annotation when live LLM tests fail so
failures are visible in the Actions UI instead of silently swallowed
Closes#4118
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
bump-version.mjs was updating package.json and sub-packages but never
regenerating package-lock.json, causing the lockfile to drift behind
by one version on every release.
Adds `npm install --package-lock-only` as the final step so the lockfile
is always in sync with the version being committed. Also regenerates the
current lockfile to fix the existing 2.58.0 → 2.64.0 drift.
Closes#4115
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Custom OpenAI-compatible providers running on localhost (e.g. a local proxy)
with an explicit apiKey in models.json received 'local-no-key-needed' during
compaction instead of their configured key, causing 401 errors.
The localhost shortcut in AuthStorage.getApiKey() was unconditional. Normal
dispatch calls getApiKeyForProvider() which skips the baseUrl check entirely,
so the fallback resolver was reached and the real key was used. Compaction
calls getApiKey(model) which passes baseUrl, hitting the shortcut first.
Closes#4106
Phase 0 of #3631 — remove dead code before screaming architecture reorg.
- auto-observability.ts (72 LOC): zero imports anywhere in codebase
- rtk-status.ts (53 LOC): zero imports anywhere in codebase
- file-watcher.ts (100 LOC): zero imports anywhere in codebase
- file-watcher.test.ts: test for dead file-watcher.ts
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gsd): reconcile stale slice rows and rebuild STATE.md before DB close
Two coupled defects caused auto-mode split-brain where dispatch falsely
reported "No slice eligible" while STATE.md showed executable work:
1. deriveStateFromDb() reconciled missing slice rows but not stale
existing ones. A slice with status "pending" in the DB but a SUMMARY
file on disk was never repaired, permanently blocking downstream
slices. Added slice-level stale reconciliation matching the existing
task-level pattern.
2. stopAuto() closed the DB before rebuilding STATE.md, forcing
deriveState() into filesystem fallback mode. Moved rebuildState()
before closeDatabase() so stop-time STATE.md uses the same
authoritative DB backend as dispatch.
Fixes#3599
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add regression test for stale slice row reconciliation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gsd): block direct writes to gsd.db via hooks to prevent corruption
When gsd_complete_task tool was unavailable, agents fell back to shell-
based sqlite3/sql.js writes to .gsd/gsd.db, corrupting the WAL-backed
database.
Extend write-intercept to block:
- File writes to gsd.db, gsd.db-wal, gsd.db-shm
- Bash commands using sqlite3/sql.js/better-sqlite3 targeting gsd.db
- Shell redirects/cp/mv targeting gsd.db
Closes#3625
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add regression test for blocking direct gsd.db writes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 of #3631 — eliminate circular imports before screaming arch reorg.
Cycle 1 (auto.ts ↔ auto-direct-dispatch.ts):
Remove redundant re-export of dispatchDirectPhase from auto.ts.
No consumer imported it through auto.ts.
Cycle 2 (context-injector.ts ↔ custom-workflow-engine.ts):
Extract readFrozenDefinition to new definition-io.ts.
context-injector now imports from definition-io directly.
Cycle 3 (preferences.ts ↔ preferences-skills.ts):
Move formatSkillRef to preferences-types.ts (pure fn, depends only on
SkillResolution which is already there).
Move resolveSkillDiscoveryMode + resolveSkillStalenessDays into
preferences.ts (trivial wrappers over loadEffectiveGSDPreferences).
Tests: new definition-io.test.ts (3 tests), preferences-formatting.test.ts
(6 tests covering all formatSkillRef branches).
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The first pass at #4099 only pre-authorized `mcp__<server>__*` tools, but
in `acceptEdits` mode the SDK still gates Read, Write, Glob/Grep, and
basic shell inspection commands like `ls`. GSD subagents need the full
workflow toolset and were still hitting "This command requires approval"
prompts on every tool call.
Two changes:
1. `resolveClaudePermissionMode` now returns `bypassPermissions` for all
GSD subagent runs (auto + interactive), dropping the `acceptEdits`
branch and the `isAutoActive` dynamic import. The host Claude Code
session's permission model is the user-visible gate; the inner SDK
process re-prompting on every tool was approval fatigue with no net
safety benefit. `GSD_CLAUDE_CODE_PERMISSION_MODE` env override stays
so security-conscious users can opt back into a stricter mode.
2. Expanded the pre-authorized `allowedTools` list to include Read,
Write, Edit, Glob, Grep, `Bash(ls:*)`, and `Bash(pwd)` alongside the
MCP server globs. Acts as a belt-and-suspenders safety net for users
who set the env override to `acceptEdits`.
Tradeoff documented inline: bypass means a prompt-injection payload read
from an untrusted file could trigger tool calls without a second gate.
Accepted because the workflow is explicit user intent and the
alternative is continuous approval fatigue that blocks real work.
Tests updated for the new allowedTools shape; permission-mode tests
already accepted bypass as the default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>