All major LLM provider SDKs were loaded eagerly at startup, penalizing
users regardless of which provider they actually use. This change defers
SDK loading until first API call for:
- @anthropic-ai/sdk (anthropic.ts)
- openai (openai-responses.ts, openai-completions.ts, azure-openai-responses.ts)
- @google/genai (google-vertex.ts)
The Bedrock provider already used this pattern. Now all 5 remaining
providers use the same async lazy-loader pattern:
- Static import changed to `import type` (erased at compile time)
- Module-level `let _SdkClass` cache variable
- `async function getSdkClass()` loader with singleton caching
- `createClient()` made async, uses `await getSdkClass()`
- Call sites updated with `await createClient()`
For google-vertex.ts, ThinkingLevel enum usage replaced with equivalent
string literals to eliminate the runtime import entirely.
All packages build cleanly. The startup improvement is proportional to
how many providers were installed — on typical installs this eliminates
eager loading of 30-40MB of SDK code.
gpt-5.x models (via Copilot/OpenAI/Azure) don't support 'minimal' as a
reasoning effort level — they only accept 'none', 'low', 'medium',
'high', and 'xhigh'. Setting /thinking minimal with gpt-5.4 causes a
400 error.
The openai-codex-responses provider already had this clamping, but the
openai-responses and azure-openai-responses providers passed the value
through unclamped.
Add clampReasoningForModel() to both providers that maps 'minimal' to
'low' for gpt-5.x models, matching the existing behavior in
openai-codex-responses.
Fixes the bug portion of #688
- Add @smithy/node-http-handler to pi-ai
- Add @types/proper-lockfile, @types/hosted-git-info, @types/sql.js to pi-coding-agent
- These were causing typecheck:extensions to fail due to missing type declarations
- Restore exhaustive never check in mapStopReason (throw on unhandled FinishReason)
- Add 12 unit tests for sanitizeSchemaForGoogle covering patternProperties removal,
const→enum conversion at various depths, arrays, deeply nested objects, pass-through
- Simplify redundant recursion branches into single typeof object catch-all
- Fix misleading comment ("only in anyOf/oneOf") — conversion happens everywhere
- Drop unnecessary (p: Part) annotation; TypeScript infers it from @google/genai types
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add startup performance analysis and optimization plan
Profiled GSD CLI startup finding 2.2s for --version and ~3.8s for
interactive mode. Identified 5 root causes with measured timings and
created a phased optimization plan targeting <0.2s for --version
and ~0.8s for interactive startup.
* perf: speed up GSD startup with lazy loading and fast paths
- Fast-path --version/-v and --help/-h in loader.ts before importing
any heavy dependencies (2.2s → 0.15s, 14x faster)
- Lazy-load undici (~200ms) only when HTTP_PROXY env vars are set
- Skip initResources cpSync when managed-resources.json version
matches current GSD version (~128ms saved per launch)
- Lazy-load Mistral SDK (~369ms) on first API call instead of startup
- Lazy-load Google GenAI SDK (~186ms) on first API call instead of
startup
- Parallelize extension loading with Promise.all() instead of
sequential for-loop
---------
Co-authored-by: TÂCHES <afromanguy@me.com>
Add Ollama Cloud (ollama.com) as a built-in provider with both model
hosting and web search/fetch capabilities.
Model provider:
- 13 curated models via OpenAI-compatible API (Llama 3.1, Qwen 3,
DeepSeek R1, Gemma 3, Mistral, Phi-4, GPT-OSS)
- Auth via OLLAMA_API_KEY environment variable
- Registered in onboarding, env hydration, and model resolver
Web tool provider:
- Search via POST ollama.com/api/web_search
- Page fetch via POST ollama.com/api/web_fetch (fallback after Jina)
- Added as third search provider option alongside Tavily and Brave
- /search-provider command updated with ollama option
Closes#430
Adds a "Custom (OpenAI-compatible)" provider option to the API key
flow in the onboarding wizard. When selected, prompts for base URL,
API key, and model ID, then writes the config to models.json.
* fix(auto): prevent hang when dispatch chain breaks after slice tasks complete (#381)
After the last task in a slice completes, dispatchNextUnit() can throw
(e.g. template mismatch, branch error, or any unprotected operation).
The error propagates to the pi event emitter which silently swallows
async rejections, leaving auto-mode active but permanently stalled —
no dispatch, no stop, no recovery.
Three defensive layers added:
1. Try-catch around dispatchNextUnit in handleAgentEnd — catches errors,
shows them to the user, and schedules a retry via the gap watchdog.
2. Dispatch gap watchdog (30s timer) — fires when auto-mode is active
but no unit was dispatched after a unit completion. Re-derives state
and retries. If retry fails, stops auto-mode with diagnostics.
3. Error boundary in the agent_end event handler — last-resort catch
that pauses auto-mode if handleAgentEnd itself throws.
Closes#381
* fix: improve Cloud Code Assist 404 error with actionable model guidance (#368)
When a model like gemini-2.0-flash isn't available via Cloud Code Assist,
the 404 error now names the model and suggests using the google provider
with GOOGLE_API_KEY or switching to a supported model.
Model variants like `claude-opus-4-6[1m]` use bracket suffixes to
differentiate context window configurations internally, but the
Anthropic API only accepts base model IDs (e.g. `claude-opus-4-6`).
Sending the full variant ID via OAuth (Claude Max/Pro) causes a 404:
{"type":"not_found_error","message":"model: claude-opus-4-6[1m]"}
Strip any `[...]` suffix from model.id for OAuth requests only.
API key auth is left unchanged since the behavior there is unverified.
Anthropic's 429 responses include retry-after and x-ratelimit-reset-*
headers that tell us exactly when to retry. Previously we ignored these
and used exponential backoff (2s, 4s, 8s), which is both wrong and
misleading in the UI countdown.
- Add retryAfterMs to AssistantMessage as the structured carrier
- Extract retry-after / x-ratelimit-reset-requests / x-ratelimit-reset-tokens
from Anthropic SDK APIError.headers in the provider catch block
- Session uses retryAfterMs when present (capped by maxDelayMs=60s),
falls back to exponential backoff for errors with no timing hint
The UI countdown now shows the actual Anthropic reset time. No UI changes needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the 1M context variant of Claude Opus 4.6 to the model registry
and fix model resolver to try exact match before glob detection, so
model IDs containing bracket characters (like [1m]) are not
misinterpreted as glob patterns.
* feat: add native Rust streaming JSON parser for LLM tool call argument parsing
Replaces the JS partial-json library with a Rust implementation exposed via napi-rs.
The parser handles incomplete JSON from streaming deltas by closing unclosed strings,
objects, arrays, removing trailing commas, and completing truncated literals.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: handle truncated numbers and remove dead partial-json dependency
Adds truncated number recovery (e.g. `{"key": 12`, `{"key": 3.`, `{"key": 1e`)
to the Rust streaming JSON parser, and removes the now-unused `partial-json`
npm dependency from pi-ai.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Vendor all 4 Pi packages (tui, ai, agent-core, coding-agent) from
pi-mono v0.57.1 as @gsd/* workspace packages under packages/. This
replaces the compiled npm dependency (@mariozechner/pi-coding-agent)
and patch-package workflow, giving direct source access for
modifications.
- Copy Pi source from pi-mono v0.57.1 into packages/
- Create workspace package.json + tsconfig.json for each package
- Rename ~240 imports from @mariozechner/pi-* to @gsd/pi-*
- Apply existing patches as source edits (setModel persist, VT input)
- Remove @mariozechner/pi-coding-agent dep and patch-package
- Update build pipeline to build packages in dependency order
- Add pi-upstream git remote for future selective syncing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>