Commit graph

21 commits

Author SHA1 Message Date
Jeremy McSpadden
6d77724378 perf: lazy-load LLM provider SDKs to reduce startup time
All major LLM provider SDKs were loaded eagerly at startup, penalizing
users regardless of which provider they actually use. This change defers
SDK loading until first API call for:

- @anthropic-ai/sdk (anthropic.ts)
- openai (openai-responses.ts, openai-completions.ts, azure-openai-responses.ts)
- @google/genai (google-vertex.ts)

The Bedrock provider already used this pattern. Now all 5 remaining
providers use the same async lazy-loader pattern:
- Static import changed to `import type` (erased at compile time)
- Module-level `let _SdkClass` cache variable
- `async function getSdkClass()` loader with singleton caching
- `createClient()` made async, uses `await getSdkClass()`
- Call sites updated with `await createClient()`

For google-vertex.ts, ThinkingLevel enum usage replaced with equivalent
string literals to eliminate the runtime import entirely.

All packages build cleanly. The startup improvement is proportional to
how many providers were installed — on typical installs this eliminates
eager loading of 30-40MB of SDK code.
2026-03-16 18:33:24 -05:00
TÂCHES
e4d47de1f6 Merge pull request #690 from trek-e/fix/688-thinking-minimal-gpt5
fix: clamp 'minimal' thinking level to 'low' for gpt-5.x models (#688)
2026-03-16 14:17:51 -06:00
Tom Boucher
1a499aecb2 fix: clamp 'minimal' thinking level to 'low' for gpt-5.x models (#688)
gpt-5.x models (via Copilot/OpenAI/Azure) don't support 'minimal' as a
reasoning effort level — they only accept 'none', 'low', 'medium',
'high', and 'xhigh'. Setting /thinking minimal with gpt-5.4 causes a
400 error.

The openai-codex-responses provider already had this clamping, but the
openai-responses and azure-openai-responses providers passed the value
through unclamped.

Add clampReasoningForModel() to both providers that maps 'minimal' to
'low' for gpt-5.x models, matching the existing behavior in
openai-codex-responses.

Fixes the bug portion of #688
2026-03-16 16:02:54 -04:00
Jamie McGregor Nelson
d4cf95f204 fix: type errors in claude-import.ts and marketplace-discovery.ts 2026-03-16 14:46:31 -04:00
Jamie McGregor Nelson
526fa7439d fix: add missing type declarations for typecheck
- Add @smithy/node-http-handler to pi-ai
- Add @types/proper-lockfile, @types/hosted-git-info, @types/sql.js to pi-coding-agent
- These were causing typecheck:extensions to fail due to missing type declarations
2026-03-16 12:29:45 -04:00
Andriyansyah Nurrachman
132ae92944 feat: update ollama cloud provider models (#578) 2026-03-15 22:22:29 -06:00
Mannan Kant
96f5b58bd3 fix(pi-ai): address review comments on #504 — exhaustive switch, tests, cleanup (#587)
- Restore exhaustive never check in mapStopReason (throw on unhandled FinishReason)
- Add 12 unit tests for sanitizeSchemaForGoogle covering patternProperties removal,
  const→enum conversion at various depths, arrays, deeply nested objects, pass-through
- Simplify redundant recursion branches into single typeof object catch-all
- Fix misleading comment ("only in anyOf/oneOf") — conversion happens everywhere
- Drop unnecessary (p: Part) annotation; TypeScript infers it from @google/genai types

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 22:21:20 -06:00
Mannan Kant
611cd0f508 copilot fix for https://github.com/gsd-build/gsd-2/issues/496 (#504) 2026-03-15 18:19:41 -06:00
Flux Labs
e6d55f8aaf Perf/gsd startup speed (#497)
* docs: add startup performance analysis and optimization plan

Profiled GSD CLI startup finding 2.2s for --version and ~3.8s for
interactive mode. Identified 5 root causes with measured timings and
created a phased optimization plan targeting <0.2s for --version
and ~0.8s for interactive startup.

* perf: speed up GSD startup with lazy loading and fast paths

- Fast-path --version/-v and --help/-h in loader.ts before importing
  any heavy dependencies (2.2s → 0.15s, 14x faster)
- Lazy-load undici (~200ms) only when HTTP_PROXY env vars are set
- Skip initResources cpSync when managed-resources.json version
  matches current GSD version (~128ms saved per launch)
- Lazy-load Mistral SDK (~369ms) on first API call instead of startup
- Lazy-load Google GenAI SDK (~186ms) on first API call instead of
  startup
- Parallelize extension loading with Promise.all() instead of
  sequential for-loop

---------

Co-authored-by: TÂCHES <afromanguy@me.com>
2026-03-15 13:33:43 -06:00
Flux Labs
ecf8125e39 feat: add Ollama Cloud as model and web tool provider (#430) (#434)
Add Ollama Cloud (ollama.com) as a built-in provider with both model
hosting and web search/fetch capabilities.

Model provider:
- 13 curated models via OpenAI-compatible API (Llama 3.1, Qwen 3,
  DeepSeek R1, Gemma 3, Mistral, Phi-4, GPT-OSS)
- Auth via OLLAMA_API_KEY environment variable
- Registered in onboarding, env hydration, and model resolver

Web tool provider:
- Search via POST ollama.com/api/web_search
- Page fetch via POST ollama.com/api/web_fetch (fallback after Jina)
- Added as third search provider option alongside Tavily and Brave
- /search-provider command updated with ollama option

Closes #430
2026-03-14 21:03:31 -06:00
Flux Labs
595c778250 feat: add custom OpenAI-compatible endpoint option to onboarding wizard (#335)
Adds a "Custom (OpenAI-compatible)" provider option to the API key
flow in the onboarding wizard. When selected, prompts for base URL,
API key, and model ID, then writes the config to models.json.
2026-03-14 15:07:47 -05:00
TÂCHES
4dcbff0c06 fix: increase timeout for z.ai provider to handle slow API spikes (#379) (#396)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 13:34:47 -06:00
Flux Labs
1f0c57aadf fix: improve Cloud Code Assist 404 error with actionable model guidance (#384)
* fix(auto): prevent hang when dispatch chain breaks after slice tasks complete (#381)

After the last task in a slice completes, dispatchNextUnit() can throw
(e.g. template mismatch, branch error, or any unprotected operation).
The error propagates to the pi event emitter which silently swallows
async rejections, leaving auto-mode active but permanently stalled —
no dispatch, no stop, no recovery.

Three defensive layers added:

1. Try-catch around dispatchNextUnit in handleAgentEnd — catches errors,
   shows them to the user, and schedules a retry via the gap watchdog.

2. Dispatch gap watchdog (30s timer) — fires when auto-mode is active
   but no unit was dispatched after a unit completion. Re-derives state
   and retries. If retry fails, stops auto-mode with diagnostics.

3. Error boundary in the agent_end event handler — last-resort catch
   that pauses auto-mode if handleAgentEnd itself throws.

Closes #381

* fix: improve Cloud Code Assist 404 error with actionable model guidance (#368)

When a model like gemini-2.0-flash isn't available via Cloud Code Assist,
the 404 error now names the model and suggests using the google provider
with GOOGLE_API_KEY or switching to a supported model.
2026-03-14 12:44:51 -06:00
Copilot
a6c5e4aca7 fix: add undici as root dependency to resolve startup crash (#372) 2026-03-14 10:45:52 -06:00
Kassie Povinelli
c3ceb077d9 feat: add alibaba-coding-plan provider support (#295) 2026-03-14 09:09:54 -06:00
vp275
03c48efbad fix: strip variant suffix from model ID for OAuth Anthropic API calls
Model variants like `claude-opus-4-6[1m]` use bracket suffixes to
differentiate context window configurations internally, but the
Anthropic API only accepts base model IDs (e.g. `claude-opus-4-6`).

Sending the full variant ID via OAuth (Claude Max/Pro) causes a 404:
  {"type":"not_found_error","message":"model: claude-opus-4-6[1m]"}

Strip any `[...]` suffix from model.id for OAuth requests only.
API key auth is left unchanged since the behavior there is unverified.
2026-03-14 16:36:45 +05:30
copilot-swe-agent[bot]
bbfbb66ed2 Remove deprecated legacy dead code from OAuth module
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
2026-03-14 05:11:55 +00:00
Lex Christopherson
ca8697ae26 feat: use server-requested retry delay for Anthropic rate limits
Anthropic's 429 responses include retry-after and x-ratelimit-reset-*
headers that tell us exactly when to retry. Previously we ignored these
and used exponential backoff (2s, 4s, 8s), which is both wrong and
misleading in the UI countdown.

- Add retryAfterMs to AssistantMessage as the structured carrier
- Extract retry-after / x-ratelimit-reset-requests / x-ratelimit-reset-tokens
  from Anthropic SDK APIError.headers in the provider catch block
- Session uses retryAfterMs when present (capped by maxDelayMs=60s),
  falls back to exponential backoff for errors with no timing hint

The UI countdown now shows the actual Anthropic reset time. No UI changes needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 16:51:17 -06:00
Juan Francisco Lebrero
5ff362ed0e feat: add claude-opus-4-6[1m] model with 1M context window (#288)
Add the 1M context variant of Claude Opus 4.6 to the model registry
and fix model resolver to try exact match before glob detection, so
model IDs containing bracket characters (like [1m]) are not
misinterpreted as glob patterns.
2026-03-13 16:25:45 -06:00
TÂCHES
8b9cfae9e9 feat: native Rust streaming JSON parser (#266)
* feat: add native Rust streaming JSON parser for LLM tool call argument parsing

Replaces the JS partial-json library with a Rust implementation exposed via napi-rs.
The parser handles incomplete JSON from streaming deltas by closing unclosed strings,
objects, arrays, removing trailing commas, and completing truncated literals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: handle truncated numbers and remove dead partial-json dependency

Adds truncated number recovery (e.g. `{"key": 12`, `{"key": 3.`, `{"key": 1e`)
to the Rust streaming JSON parser, and removes the now-unused `partial-json`
npm dependency from pi-ai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 16:21:58 -06:00
Lex Christopherson
c80d640d35 feat: vendor Pi source into workspace monorepo
Vendor all 4 Pi packages (tui, ai, agent-core, coding-agent) from
pi-mono v0.57.1 as @gsd/* workspace packages under packages/. This
replaces the compiled npm dependency (@mariozechner/pi-coding-agent)
and patch-package workflow, giving direct source access for
modifications.

- Copy Pi source from pi-mono v0.57.1 into packages/
- Create workspace package.json + tsconfig.json for each package
- Rename ~240 imports from @mariozechner/pi-* to @gsd/pi-*
- Apply existing patches as source edits (setModel persist, VT input)
- Remove @mariozechner/pi-coding-agent dep and patch-package
- Update build pipeline to build packages in dependency order
- Add pi-upstream git remote for future selective syncing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:55:17 -06:00