singularity-forge/docs/user-docs/providers.md
ace-pm b29c12d5e5 refactor(native): rename gsd_parser.rs to forge_parser.rs
Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:58:21 +02:00

17 KiB

Provider Setup Guide

Step-by-step setup instructions for every LLM provider SF supports. If you ran the onboarding wizard (sf config) and picked a provider, you may already be configured — check with /model inside a session.

Table of Contents

Quick Reference

Provider Auth Method Env Variable Config File
Anthropic API key ANTHROPIC_API_KEY
OpenAI API key OPENAI_API_KEY
Google Gemini API key GEMINI_API_KEY
OpenRouter API key OPENROUTER_API_KEY Optional models.json
Groq API key GROQ_API_KEY
xAI API key XAI_API_KEY
Mistral API key MISTRAL_API_KEY
GitHub Copilot OAuth GH_TOKEN
Amazon Bedrock IAM credentials AWS_PROFILE or AWS_ACCESS_KEY_ID
Vertex AI ADC GOOGLE_APPLICATION_CREDENTIALS
Azure OpenAI API key AZURE_OPENAI_API_KEY
Ollama None (local) models.json required
LM Studio None (local) models.json required
vLLM / SGLang None (local) models.json required

Built-in Providers

Built-in providers have models pre-registered in SF. You only need to supply credentials.

Anthropic (Claude)

Recommended. Anthropic models have the deepest integration: built-in web search, extended thinking, and prompt caching.

Option A — API key (recommended):

export ANTHROPIC_API_KEY="sk-ant-..."

Or run sf config and paste your key when prompted.

Get a key: console.anthropic.com/settings/keys

Option B — Claude Code CLI:

If you have a Claude Pro or Max subscription, you can authenticate through Anthropic's official Claude Code CLI. Install it, sign in with claude, then SF will detect and route through it automatically:

# Install Claude Code CLI (see https://docs.anthropic.com/en/docs/claude-code)
claude
# Sign in when prompted, then start SF
sf

SF detects your local Claude Code installation and uses it as the authenticated Anthropic surface. This is the TOS-compliant path for subscription users — SF never handles your subscription credentials directly.

Note: SF does not support browser-based OAuth sign-in for Anthropic. Use an API key or the Claude Code CLI instead.

Option C — Use your Claude Pro/Max plan with SF inside Claude Code:

If you already have a Claude Pro or Max subscription and want to use SF's planning, execution, and milestone orchestration directly from Claude Code — without switching to a separate terminal — you can connect SF as an MCP server. This gives Claude Code access to SF's full workflow toolset via the Model Context Protocol, so you get SF's structured project management powered by your existing Claude plan.

Automatic setup (recommended):

When SF detects a Claude Code model during startup, it automatically writes a .mcp.json file in your project root with the SF workflow MCP server configured. No manual steps needed — just start SF once with Claude Code as the provider and the config is created for you.

You can also trigger this manually from inside a SF session:

/sf mcp init

This writes (or updates) the sf-workflow entry in your project's .mcp.json. Claude Code discovers this file automatically on its next session start.

Manual setup:

If you prefer to configure it yourself, add SF to your project's .mcp.json:

{
  "mcpServers": {
    "sf": {
      "command": "npx",
      "args": ["sf-mcp-server"],
      "env": {
        "SF_CLI_PATH": "/path/to/sf"
      }
    }
  }
}

Or if sf-mcp-server is installed globally:

{
  "mcpServers": {
    "sf": {
      "command": "sf-mcp-server"
    }
  }
}

You can also add this to ~/.claude/settings.json under mcpServers to make SF available across all projects.

What's exposed:

The MCP server provides SF's full workflow tool surface — milestone planning, task completion, slice management, roadmap reassessment, journal queries, and more. Session management tools (gsd_execute, gsd_status, gsd_result, gsd_cancel) let Claude Code start and monitor SF auto-mode sessions. See Commands → MCP Server Mode for the full tool list.

Verify the connection:

From inside a SF session, check that the MCP server is reachable:

/sf mcp status

OpenAI

export OPENAI_API_KEY="sk-..."

Or run sf config and choose "Paste an API key" then "OpenAI".

Get a key: platform.openai.com/api-keys

Google Gemini

export GEMINI_API_KEY="..."

Get a key: aistudio.google.com/app/apikey

OpenRouter

OpenRouter aggregates 200+ models from multiple providers behind a single API key.

Step 1 — Get your API key:

Go to openrouter.ai/keys and create a key.

Step 2 — Set the key:

export OPENROUTER_API_KEY="sk-or-..."

Or run sf config, choose "Paste an API key", then "OpenRouter".

Step 3 — Switch to an OpenRouter model:

Inside a SF session, type /model and select an OpenRouter model. Models are prefixed with openrouter/ (e.g., openrouter/anthropic/claude-sonnet-4).

Optional — Add custom OpenRouter models via models.json:

If you want models not in the built-in list, add them to ~/.sf/agent/models.json:

{
  "providers": {
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "OPENROUTER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "meta-llama/llama-3.3-70b",
          "name": "Llama 3.3 70B (OpenRouter)",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 131072,
          "maxTokens": 32768,
          "cost": { "input": 0.3, "output": 0.3, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Note: the apiKey field here is the name of the environment variable, not the literal key. SF resolves it automatically. You can also use a literal value or a shell command (see Value Resolution).

Optional — Route through specific providers:

Use modelOverrides to control which upstream provider OpenRouter uses:

{
  "providers": {
    "openrouter": {
      "modelOverrides": {
        "anthropic/claude-sonnet-4": {
          "compat": {
            "openRouterRouting": {
              "only": ["amazon-bedrock"]
            }
          }
        }
      }
    }
  }
}

Groq

export GROQ_API_KEY="gsk_..."

Get a key: console.groq.com/keys

xAI (Grok)

export XAI_API_KEY="xai-..."

Get a key: console.x.ai

Mistral

export MISTRAL_API_KEY="..."

Get a key: console.mistral.ai/api-keys

GitHub Copilot

Uses OAuth — sign in through the browser:

sf config
# Choose "Sign in with your browser" → "GitHub Copilot"

Requires an active GitHub Copilot subscription.

Amazon Bedrock

Bedrock uses AWS IAM credentials, not API keys. Any of these work:

# Option 1: Named profile
export AWS_PROFILE="my-profile"

# Option 2: IAM keys
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Option 3: Bedrock API key (bearer token)
export AWS_BEARER_TOKEN_BEDROCK="..."

ECS task roles and IRSA (Kubernetes) are also detected automatically.

Anthropic on Vertex AI

Uses Google Cloud Application Default Credentials:

gcloud auth application-default login
export ANTHROPIC_VERTEX_PROJECT_ID="my-project-id"

Or set GOOGLE_CLOUD_PROJECT and ensure ADC credentials exist at ~/.config/gcloud/application_default_credentials.json.

Azure OpenAI

export AZURE_OPENAI_API_KEY="..."

Local Providers

Local providers run on your machine. They require a models.json configuration file because SF needs to know the endpoint URL and which models are available.

Config file location: ~/.sf/agent/models.json

The file reloads each time you open /model — no restart needed.

Ollama

Step 1 — Install and start Ollama:

# macOS
brew install ollama
ollama serve

# Or download from https://ollama.com

Step 2 — Pull a model:

ollama pull llama3.1:8b
ollama pull qwen2.5-coder:7b

Step 3 — Create ~/.sf/agent/models.json:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}

The apiKey is required by the config schema but Ollama ignores it — any value works.

Step 4 — Select the model:

Inside SF, type /model and pick your Ollama model.

Ollama tips:

  • Ollama does not support the developer role or reasoning_effort — always set compat.supportsDeveloperRole: false and compat.supportsReasoningEffort: false.
  • If you get empty responses, check that ollama serve is running and the model is pulled.
  • Context window and max tokens default to 128K / 16K if not specified. Override these if your model has different limits.

LM Studio

Step 1 — Install LM Studio:

Download from lmstudio.ai.

Step 2 — Start the local server:

In LM Studio, go to the "Local Server" tab, load a model, and click "Start Server". The default port is 1234.

Step 3 — Create ~/.sf/agent/models.json:

{
  "providers": {
    "lm-studio": {
      "baseUrl": "http://localhost:1234/v1",
      "api": "openai-completions",
      "apiKey": "lm-studio",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "your-model-name",
          "name": "My Local Model",
          "contextWindow": 32768,
          "maxTokens": 4096
        }
      ]
    }
  }
}

Replace your-model-name with the model identifier shown in LM Studio's server tab.

LM Studio tips:

  • The model ID in models.json must match what LM Studio reports in its server API. Check the server tab for the exact string.
  • LM Studio defaults to port 1234. If you changed it, update baseUrl accordingly.
  • Increase contextWindow and maxTokens if your model supports larger contexts.

vLLM

{
  "providers": {
    "vllm": {
      "baseUrl": "http://localhost:8000/v1",
      "api": "openai-completions",
      "apiKey": "vllm",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false,
        "supportsUsageInStreaming": false
      },
      "models": [
        {
          "id": "meta-llama/Llama-3.1-8B-Instruct",
          "contextWindow": 128000,
          "maxTokens": 16384
        }
      ]
    }
  }
}

The model id must match the --model flag you passed to vllm serve.

SGLang

{
  "providers": {
    "sglang": {
      "baseUrl": "http://localhost:30000/v1",
      "api": "openai-completions",
      "apiKey": "sglang",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "meta-llama/Llama-3.1-8B-Instruct"
        }
      ]
    }
  }
}

Custom OpenAI-Compatible Endpoints

Any server that implements the OpenAI Chat Completions API can work with SF. This covers proxies (LiteLLM, Portkey, Helicone), self-hosted inference, and new providers.

Quickest path — use the onboarding wizard:

sf config
# Choose "Paste an API key" → "Custom (OpenAI-compatible)"
# Enter: base URL, API key, model ID

This writes ~/.sf/agent/models.json for you automatically.

Manual setup:

{
  "providers": {
    "my-provider": {
      "baseUrl": "https://my-endpoint.example.com/v1",
      "apiKey": "MY_PROVIDER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "model-id-here",
          "name": "Friendly Model Name",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 16384,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

Adding custom headers (for proxies):

{
  "providers": {
    "litellm-proxy": {
      "baseUrl": "https://litellm.example.com/v1",
      "apiKey": "MY_API_KEY",
      "api": "openai-completions",
      "headers": {
        "x-custom-header": "value"
      },
      "models": [...]
    }
  }
}

Qwen models with thinking mode:

For Qwen-compatible servers, use thinkingFormat to enable thinking mode:

{
  "compat": {
    "thinkingFormat": "qwen",
    "supportsDeveloperRole": false
  }
}

Use "qwen-chat-template" instead if the server requires chat_template_kwargs.enable_thinking.

For the full reference on compat fields, modelOverrides, value resolution, and advanced configuration, see Custom Models.


Common Pitfalls

"Authentication failed" with a valid key

Cause: The key is set in your shell but not visible to SF.

Fix: Make sure the environment variable is exported in the same terminal where you run sf. Or use sf config to save the key to ~/.sf/agent/auth.json so it persists across sessions.

OpenRouter models not appearing in /model

Cause: No OPENROUTER_API_KEY set, so SF hides OpenRouter models.

Fix: Set the key and restart SF:

export OPENROUTER_API_KEY="sk-or-..."
sf

Ollama returns empty responses

Cause: Ollama server isn't running, or the model isn't pulled.

Fix:

# Verify the server is running
curl http://localhost:11434/v1/models

# Pull the model if missing
ollama pull llama3.1:8b

LM Studio model ID mismatch

Cause: The id in models.json doesn't match what LM Studio exposes via its API.

Fix: Check the LM Studio server tab for the exact model identifier. It often includes the filename or quantization level (e.g., lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF).

developer role error with local models

Cause: Most local inference servers don't support the OpenAI developer message role.

Fix: Add compat.supportsDeveloperRole: false to the provider config. This makes SF send system messages instead:

{
  "compat": {
    "supportsDeveloperRole": false,
    "supportsReasoningEffort": false
  }
}

stream_options error with local models

Cause: Some servers don't support stream_options: { include_usage: true }.

Fix: Add compat.supportsUsageInStreaming: false:

{
  "compat": {
    "supportsUsageInStreaming": false
  }
}

"apiKey is required" validation error

Cause: models.json schema requires apiKey when models are defined.

Fix: For local servers that don't need auth, set a dummy value:

"apiKey": "not-needed"

Cost shows $0.00 for custom models

Expected behavior. SF defaults cost to zero for custom models. Override with the cost field if you want accurate cost tracking:

"cost": { "input": 0.15, "output": 0.60, "cacheRead": 0.015, "cacheWrite": 0.19 }

Values are per million tokens.


Verifying Your Setup

After configuring a provider:

  1. Launch SF:

    sf
    
  2. Check available models:

    /model
    

    Your provider's models should appear in the list.

  3. Switch to the model: Select it from the /model picker.

  4. Send a test message: Type anything to confirm the model responds.

If the model doesn't appear, check:

  • The environment variable is set in the current shell
  • models.json is valid JSON (use cat ~/.sf/agent/models.json | python3 -m json.tool)
  • The server is running (for local providers)

For additional help, see Troubleshooting or run /sf doctor inside a session.