diff --git a/README.md b/README.md index 33e29d038..4dced8410 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,7 @@ Full documentation is available in the [`docs/`](./docs/) directory: - **[Getting Started](./docs/getting-started.md)** — install, first run, basic usage - **[Auto Mode](./docs/auto-mode.md)** — autonomous execution deep-dive - **[Configuration](./docs/configuration.md)** — all preferences, models, git, and hooks +- **[Custom Models](./docs/custom-models.md)** — add custom providers (Ollama, vLLM, LM Studio, proxies) - **[Token Optimization](./docs/token-optimization.md)** — profiles, context compression, complexity routing - **[Cost Management](./docs/cost-management.md)** — budgets, tracking, projections - **[Git Strategy](./docs/git-strategy.md)** — worktree isolation, branching, merge behavior diff --git a/docs/README.md b/docs/README.md index 080a5eaf7..290201e79 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,6 +11,7 @@ Welcome to the GSD documentation. This covers everything from getting started to | [Commands Reference](./commands.md) | All commands, keyboard shortcuts, and CLI flags | | [Remote Questions](./remote-questions.md) | Discord and Slack integration for headless auto-mode | | [Configuration](./configuration.md) | Preferences, model selection, git settings, and token profiles | +| [Custom Models](./custom-models.md) | Add custom providers (Ollama, vLLM, LM Studio, proxies) via models.json | | [Token Optimization](./token-optimization.md) | Token profiles, context compression, complexity routing, and adaptive learning (v2.17) | | [Dynamic Model Routing](./dynamic-model-routing.md) | Complexity-based model selection, cost tables, escalation, and budget pressure (v2.19) | | [Captures & Triage](./captures-triage.md) | Fire-and-forget thought capture during auto-mode with automated triage (v2.19) | diff --git a/docs/configuration.md b/docs/configuration.md index d5c9a3a7a..d8e1111e6 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -187,13 +187,35 @@ models: ### Custom Model Definitions (`models.json`) -Define custom models in `~/.gsd/agent/models.json`. This lets you add models not included in the default registry — useful for self-hosted endpoints, fine-tuned models, or new releases. +Define custom models and providers in `~/.gsd/agent/models.json`. This lets you add models not included in the default registry — useful for self-hosted endpoints (Ollama, vLLM, LM Studio), fine-tuned models, proxies, or new provider releases. GSD resolves models.json with fallback logic: 1. `~/.gsd/agent/models.json` — primary (GSD) 2. `~/.pi/agent/models.json` — fallback (Pi) 3. If neither exists, creates `~/.gsd/agent/models.json` +**Quick example for local models (Ollama):** + +```json +{ + "providers": { + "ollama": { + "baseUrl": "http://localhost:11434/v1", + "api": "openai-completions", + "apiKey": "ollama", + "models": [ + { "id": "llama3.1:8b" }, + { "id": "qwen2.5-coder:7b" } + ] + } + } +} +``` + +The file reloads each time you open `/model` — no restart needed. + +For full documentation including provider configuration, model overrides, OpenAI compatibility settings, and advanced examples, see the [Custom Models Guide](./custom-models.md). + **With fallbacks:** ```yaml diff --git a/docs/custom-models.md b/docs/custom-models.md new file mode 100644 index 000000000..943d213bf --- /dev/null +++ b/docs/custom-models.md @@ -0,0 +1,335 @@ +# Custom Models + +Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via `~/.gsd/agent/models.json`. + +## Table of Contents + +- [Minimal Example](#minimal-example) +- [Full Example](#full-example) +- [Supported APIs](#supported-apis) +- [Provider Configuration](#provider-configuration) +- [Model Configuration](#model-configuration) +- [Overriding Built-in Providers](#overriding-built-in-providers) +- [Per-model Overrides](#per-model-overrides) +- [OpenAI Compatibility](#openai-compatibility) + +## Minimal Example + +For local models (Ollama, LM Studio, vLLM), only `id` is required per model: + +```json +{ + "providers": { + "ollama": { + "baseUrl": "http://localhost:11434/v1", + "api": "openai-completions", + "apiKey": "ollama", + "models": [ + { "id": "llama3.1:8b" }, + { "id": "qwen2.5-coder:7b" } + ] + } + } +} +``` + +The `apiKey` is required but Ollama ignores it, so any value works. + +Some OpenAI-compatible servers do not understand the `developer` role used for reasoning-capable models. For those providers, set `compat.supportsDeveloperRole` to `false` so GSD sends the system prompt as a `system` message instead. If the server also does not support `reasoning_effort`, set `compat.supportsReasoningEffort` to `false` too. + +You can set `compat` at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers. + +```json +{ + "providers": { + "ollama": { + "baseUrl": "http://localhost:11434/v1", + "api": "openai-completions", + "apiKey": "ollama", + "compat": { + "supportsDeveloperRole": false, + "supportsReasoningEffort": false + }, + "models": [ + { + "id": "gpt-oss:20b", + "reasoning": true + } + ] + } + } +} +``` + +## Full Example + +Override defaults when you need specific values: + +```json +{ + "providers": { + "ollama": { + "baseUrl": "http://localhost:11434/v1", + "api": "openai-completions", + "apiKey": "ollama", + "models": [ + { + "id": "llama3.1:8b", + "name": "Llama 3.1 8B (Local)", + "reasoning": false, + "input": ["text"], + "contextWindow": 128000, + "maxTokens": 32000, + "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 } + } + ] + } + } +} +``` + +The file reloads each time you open `/model`. Edit during session; no restart needed. + +## Supported APIs + +| API | Description | +|-----|-------------| +| `openai-completions` | OpenAI Chat Completions (most compatible) | +| `openai-responses` | OpenAI Responses API | +| `anthropic-messages` | Anthropic Messages API | +| `google-generative-ai` | Google Generative AI | + +Set `api` at provider level (default for all models) or model level (override per model). + +## Provider Configuration + +| Field | Description | +|-------|-------------| +| `baseUrl` | API endpoint URL | +| `api` | API type (see above) | +| `apiKey` | API key (see value resolution below) | +| `headers` | Custom headers (see value resolution below) | +| `authHeader` | Set `true` to add `Authorization: Bearer ` automatically | +| `models` | Array of model configurations | +| `modelOverrides` | Per-model overrides for built-in models on this provider | + +### Value Resolution + +The `apiKey` and `headers` fields support three formats: + +- **Shell command:** `"!command"` executes and uses stdout + ```json + "apiKey": "!security find-generic-password -ws 'anthropic'" + "apiKey": "!op read 'op://vault/item/credential'" + ``` +- **Environment variable:** Uses the value of the named variable + ```json + "apiKey": "MY_API_KEY" + ``` +- **Literal value:** Used directly + ```json + "apiKey": "sk-..." + ``` + +### Custom Headers + +```json +{ + "providers": { + "custom-proxy": { + "baseUrl": "https://proxy.example.com/v1", + "apiKey": "MY_API_KEY", + "api": "anthropic-messages", + "headers": { + "x-portkey-api-key": "PORTKEY_API_KEY", + "x-secret": "!op read 'op://vault/item/secret'" + }, + "models": [...] + } + } +} +``` + +## Model Configuration + +| Field | Required | Default | Description | +|-------|----------|---------|-------------| +| `id` | Yes | — | Model identifier (passed to the API) | +| `name` | No | `id` | Human-readable model label. Used for matching (`--model` patterns) and shown in model details/status text. | +| `api` | No | provider's `api` | Override provider's API for this model | +| `reasoning` | No | `false` | Supports extended thinking | +| `input` | No | `["text"]` | Input types: `["text"]` or `["text", "image"]` | +| `contextWindow` | No | `128000` | Context window size in tokens | +| `maxTokens` | No | `16384` | Maximum output tokens | +| `cost` | No | all zeros | `{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}` (per million tokens) | +| `compat` | No | provider `compat` | OpenAI compatibility overrides. Merged with provider-level `compat` when both are set. | + +Current behavior: +- `/model` and `--list-models` list entries by model `id`. +- The configured `name` is used for model matching and detail/status text. + +## Overriding Built-in Providers + +Route a built-in provider through a proxy without redefining models: + +```json +{ + "providers": { + "anthropic": { + "baseUrl": "https://my-proxy.example.com/v1" + } + } +} +``` + +All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work. + +To merge custom models into a built-in provider, include the `models` array: + +```json +{ + "providers": { + "anthropic": { + "baseUrl": "https://my-proxy.example.com/v1", + "apiKey": "ANTHROPIC_API_KEY", + "api": "anthropic-messages", + "models": [...] + } + } +} +``` + +Merge semantics: +- Built-in models are kept. +- Custom models are upserted by `id` within the provider. +- If a custom model `id` matches a built-in model `id`, the custom model replaces that built-in model. +- If a custom model `id` is new, it is added alongside built-in models. + +## Per-model Overrides + +Use `modelOverrides` to customize specific built-in models without replacing the provider's full model list. + +```json +{ + "providers": { + "openrouter": { + "modelOverrides": { + "anthropic/claude-sonnet-4": { + "name": "Claude Sonnet 4 (Bedrock Route)", + "compat": { + "openRouterRouting": { + "only": ["amazon-bedrock"] + } + } + } + } + } + } +} +``` + +`modelOverrides` supports these fields per model: `name`, `reasoning`, `input`, `cost` (partial), `contextWindow`, `maxTokens`, `headers`, `compat`. + +Behavior notes: +- `modelOverrides` are applied to built-in provider models. +- Unknown model IDs are ignored. +- You can combine provider-level `baseUrl`/`headers` with `modelOverrides`. +- If `models` is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same `id` replaces the overridden built-in model entry. + +## OpenAI Compatibility + +For providers with partial OpenAI compatibility, use the `compat` field. + +- Provider-level `compat` applies defaults to all models under that provider. +- Model-level `compat` overrides provider-level values for that model. + +```json +{ + "providers": { + "local-llm": { + "baseUrl": "http://localhost:8080/v1", + "api": "openai-completions", + "compat": { + "supportsUsageInStreaming": false, + "maxTokensField": "max_tokens" + }, + "models": [...] + } + } +} +``` + +| Field | Description | +|-------|-------------| +| `supportsStore` | Provider supports `store` field | +| `supportsDeveloperRole` | Use `developer` vs `system` role | +| `supportsReasoningEffort` | Support for `reasoning_effort` parameter | +| `reasoningEffortMap` | Map GSD thinking levels to provider-specific `reasoning_effort` values | +| `supportsUsageInStreaming` | Supports `stream_options: { include_usage: true }` (default: `true`) | +| `maxTokensField` | Use `max_completion_tokens` or `max_tokens` | +| `requiresToolResultName` | Include `name` on tool result messages | +| `requiresAssistantAfterToolResult` | Insert an assistant message before a user message after tool results | +| `requiresThinkingAsText` | Convert thinking blocks to plain text | +| `thinkingFormat` | Use `reasoning_effort`, `zai`, `qwen`, or `qwen-chat-template` thinking parameters | +| `supportsStrictMode` | Include the `strict` field in tool definitions | +| `openRouterRouting` | OpenRouter routing config passed to OpenRouter for model/provider selection | +| `vercelGatewayRouting` | Vercel AI Gateway routing config for provider selection (`only`, `order`) | + +`qwen` uses top-level `enable_thinking`. Use `qwen-chat-template` for local Qwen-compatible servers that require `chat_template_kwargs.enable_thinking`. + +Example: + +```json +{ + "providers": { + "openrouter": { + "baseUrl": "https://openrouter.ai/api/v1", + "apiKey": "OPENROUTER_API_KEY", + "api": "openai-completions", + "models": [ + { + "id": "openrouter/anthropic/claude-3.5-sonnet", + "name": "OpenRouter Claude 3.5 Sonnet", + "compat": { + "openRouterRouting": { + "order": ["anthropic"], + "fallbacks": ["openai"] + } + } + } + ] + } + } +} +``` + +Vercel AI Gateway example: + +```json +{ + "providers": { + "vercel-ai-gateway": { + "baseUrl": "https://ai-gateway.vercel.sh/v1", + "apiKey": "AI_GATEWAY_API_KEY", + "api": "openai-completions", + "models": [ + { + "id": "moonshotai/kimi-k2.5", + "name": "Kimi K2.5 (Fireworks via Vercel)", + "reasoning": true, + "input": ["text", "image"], + "cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 }, + "contextWindow": 262144, + "maxTokens": 262144, + "compat": { + "vercelGatewayRouting": { + "only": ["fireworks", "novita"], + "order": ["fireworks", "novita"] + } + } + } + ] + } + } +} +```