singularity-forge/gitbook/configuration/custom-models.md
ace-pm b29c12d5e5 refactor(native): rename gsd_parser.rs to forge_parser.rs
Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:58:21 +02:00

3.2 KiB

Custom Models

Define custom models and providers in ~/.sf/agent/models.json. This lets you add models not in the default registry — self-hosted endpoints, fine-tuned models, proxies, or new provider releases.

File Location

SF looks for models.json at:

  1. ~/.sf/agent/models.json (primary)
  2. ~/.pi/agent/models.json (fallback)

The file reloads each time you open /model — no restart needed.

Basic Structure

{
  "providers": {
    "my-provider": {
      "baseUrl": "https://my-endpoint.example.com/v1",
      "apiKey": "MY_PROVIDER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "model-id-here",
          "name": "Friendly Model Name",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 16384,
          "cost": { "input": 0.15, "output": 0.60, "cacheRead": 0.015, "cacheWrite": 0.19 }
        }
      ]
    }
  }
}

API Key Resolution

The apiKey field can be:

  • An environment variable name: "OPENROUTER_API_KEY" — SF resolves it automatically
  • A literal value: "sk-abc123..." — used directly
  • A dummy value: "not-needed" — for local servers that don't require auth

Compatibility Flags

Local and non-standard servers often need compatibility adjustments:

{
  "compat": {
    "supportsDeveloperRole": false,
    "supportsReasoningEffort": false,
    "supportsUsageInStreaming": false,
    "thinkingFormat": "qwen"
  }
}
Flag Default Purpose
supportsDeveloperRole true Set false if the server doesn't support the developer message role
supportsReasoningEffort true Set false if the server doesn't support reasoning effort parameters
supportsUsageInStreaming true Set false if streaming responses don't include token usage
thinkingFormat Set "qwen" for Qwen thinking mode, "qwen-chat-template" for chat template variant

Custom Headers

For proxies that need extra headers:

{
  "providers": {
    "litellm-proxy": {
      "baseUrl": "https://litellm.example.com/v1",
      "apiKey": "MY_API_KEY",
      "api": "openai-completions",
      "headers": {
        "x-custom-header": "value"
      },
      "models": [...]
    }
  }
}

Model Overrides

Override specific model settings without redefining the entire model:

{
  "providers": {
    "openrouter": {
      "modelOverrides": {
        "anthropic/claude-sonnet-4": {
          "compat": {
            "openRouterRouting": {
              "only": ["amazon-bedrock"]
            }
          }
        }
      }
    }
  }
}

Cost Tracking

For accurate cost tracking with custom models, add the cost field (per million tokens):

"cost": {
  "input": 0.15,
  "output": 0.60,
  "cacheRead": 0.015,
  "cacheWrite": 0.19
}

Without this, cost shows $0.00 — which is the expected default for custom models.

Community Extensions

For providers not built into SF, community extensions add full provider support:

Extension Provider Install
pi-dashscope Alibaba DashScope (Qwen3, GLM-5, etc.) sf install npm:pi-dashscope