|
| 1 | +# Provider Configuration |
| 2 | + |
| 3 | +Phantom routes every LLM query (the main agent and every evolution judge) through the Claude Agent SDK as a subprocess. By setting environment variables that the bundled `cli.js` already honors, you can point that subprocess at any Anthropic Messages API compatible endpoint without changing a line of code. |
| 4 | + |
| 5 | +The `provider:` block in `phantom.yaml` is a small config surface that translates into those environment variables for you. |
| 6 | + |
| 7 | +## Supported Providers |
| 8 | + |
| 9 | +| Type | Base URL | API Key Env | Notes | |
| 10 | +|------|----------|-------------|-------| |
| 11 | +| `anthropic` (default) | `https://api.anthropic.com` | `ANTHROPIC_API_KEY` | Claude Opus, Sonnet, Haiku | |
| 12 | +| `zai` | `https://api.z.ai/api/anthropic` | `ZAI_API_KEY` | GLM-5.1 and GLM-4.5-Air, roughly 15x cheaper than Opus | |
| 13 | +| `openrouter` | `https://openrouter.ai/api/v1` | `OPENROUTER_API_KEY` | 100+ models through a single key | |
| 14 | +| `vllm` | `http://localhost:8000` | none | Self-hosted OpenAI-compatible inference | |
| 15 | +| `ollama` | `http://localhost:11434` | none | Local GGUF models, zero API cost | |
| 16 | +| `litellm` | `http://localhost:4000` | `LITELLM_KEY` | Local proxy bridging OpenAI, Gemini, and others | |
| 17 | +| `custom` | (you set it) | (you set it) | Any Anthropic Messages API compatible endpoint | |
| 18 | + |
| 19 | +## Quick Reference |
| 20 | + |
| 21 | +### Anthropic (default) |
| 22 | + |
| 23 | +No configuration needed. Existing deployments continue to work unchanged. |
| 24 | + |
| 25 | +```yaml |
| 26 | +# phantom.yaml |
| 27 | +model: claude-sonnet-4-6 |
| 28 | +# No provider block = defaults to anthropic |
| 29 | +``` |
| 30 | + |
| 31 | +```bash |
| 32 | +# .env |
| 33 | +ANTHROPIC_API_KEY=sk-ant-... |
| 34 | +``` |
| 35 | + |
| 36 | +### Z.AI / GLM-5.1 |
| 37 | + |
| 38 | +Z.AI provides an Anthropic Messages API compatible endpoint at `https://api.z.ai/api/anthropic`. Phantom ships with a `zai` preset that points there automatically. Get a key at [docs.z.ai](https://docs.z.ai/guides/llm/glm-5). |
| 39 | + |
| 40 | +```yaml |
| 41 | +# phantom.yaml |
| 42 | +model: claude-sonnet-4-6 |
| 43 | +provider: |
| 44 | + type: zai |
| 45 | + api_key_env: ZAI_API_KEY |
| 46 | + model_mappings: |
| 47 | + opus: glm-5.1 |
| 48 | + sonnet: glm-5.1 |
| 49 | + haiku: glm-4.5-air |
| 50 | +``` |
| 51 | +
|
| 52 | +```bash |
| 53 | +# .env |
| 54 | +ZAI_API_KEY=<your-zai-key> |
| 55 | +``` |
| 56 | + |
| 57 | +Both the main agent and every evolution judge route through Z.AI. The `claude-sonnet-4-6` model name is translated to `glm-5.1` on the wire by the `model_mappings` block. |
| 58 | + |
| 59 | +### Ollama (local, free) |
| 60 | + |
| 61 | +Run any GGUF model on your own GPU. No API key needed. |
| 62 | + |
| 63 | +```yaml |
| 64 | +# phantom.yaml |
| 65 | +model: claude-sonnet-4-6 |
| 66 | +provider: |
| 67 | + type: ollama |
| 68 | + model_mappings: |
| 69 | + opus: qwen3-coder:32b |
| 70 | + sonnet: qwen3-coder:32b |
| 71 | + haiku: qwen3-coder:14b |
| 72 | +``` |
| 73 | +
|
| 74 | +Ollama must be running at `http://localhost:11434` (the preset default). The model must support function calling to work with Phantom's agent loop. |
| 75 | + |
| 76 | +### vLLM (self-hosted) |
| 77 | + |
| 78 | +For organizations running their own inference clusters. |
| 79 | + |
| 80 | +```yaml |
| 81 | +# phantom.yaml |
| 82 | +model: claude-sonnet-4-6 |
| 83 | +provider: |
| 84 | + type: vllm |
| 85 | + base_url: http://your-vllm-server:8000 |
| 86 | + model_mappings: |
| 87 | + sonnet: your-model-name |
| 88 | + timeout_ms: 300000 # local models can be slow on first call |
| 89 | +``` |
| 90 | + |
| 91 | +Start vLLM with `--tool-call-parser` matching your model for tool use to work. |
| 92 | + |
| 93 | +### OpenRouter |
| 94 | + |
| 95 | +Access 100+ models through a single OpenRouter key. |
| 96 | + |
| 97 | +```yaml |
| 98 | +# phantom.yaml |
| 99 | +model: claude-sonnet-4-6 |
| 100 | +provider: |
| 101 | + type: openrouter |
| 102 | + api_key_env: OPENROUTER_API_KEY |
| 103 | + model_mappings: |
| 104 | + sonnet: anthropic/claude-sonnet-4.5 |
| 105 | +``` |
| 106 | + |
| 107 | +### LiteLLM (proxy) |
| 108 | + |
| 109 | +Run a local LiteLLM proxy to bridge OpenAI, Gemini, and other formats. |
| 110 | + |
| 111 | +```yaml |
| 112 | +# phantom.yaml |
| 113 | +model: claude-sonnet-4-6 |
| 114 | +provider: |
| 115 | + type: litellm |
| 116 | + api_key_env: LITELLM_KEY |
| 117 | + # base_url defaults to http://localhost:4000 |
| 118 | +``` |
| 119 | + |
| 120 | +### Custom endpoint |
| 121 | + |
| 122 | +For any Anthropic Messages API compatible proxy (LM Studio, custom internal gateways, etc.). |
| 123 | + |
| 124 | +```yaml |
| 125 | +# phantom.yaml |
| 126 | +model: claude-sonnet-4-6 |
| 127 | +provider: |
| 128 | + type: custom |
| 129 | + base_url: https://your-proxy.internal/anthropic |
| 130 | + api_key_env: YOUR_CUSTOM_KEY_ENV |
| 131 | +``` |
| 132 | + |
| 133 | +## Configuration Fields |
| 134 | + |
| 135 | +| Field | Type | Default | Purpose | |
| 136 | +|-------|------|---------|---------| |
| 137 | +| `type` | enum | `anthropic` | One of the supported provider types | |
| 138 | +| `base_url` | URL | preset default | Override the endpoint URL | |
| 139 | +| `api_key_env` | string | preset default | Name of the env var holding the credential | |
| 140 | +| `model_mappings.opus` | string | none | Concrete model ID for the opus tier | |
| 141 | +| `model_mappings.sonnet` | string | none | Concrete model ID for the sonnet tier | |
| 142 | +| `model_mappings.haiku` | string | none | Concrete model ID for the haiku tier | |
| 143 | +| `disable_betas` | boolean | preset default | Sets `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`. Defaulted true for every non-anthropic preset. | |
| 144 | +| `timeout_ms` | number | none | Sets `API_TIMEOUT_MS` for slow local inference | |
| 145 | + |
| 146 | +## Environment Variable Overrides |
| 147 | + |
| 148 | +For operators who prefer env variables over YAML edits: |
| 149 | + |
| 150 | +| Variable | Effect | |
| 151 | +|----------|--------| |
| 152 | +| `PHANTOM_PROVIDER_TYPE` | Override `provider.type` (validated against the supported values) | |
| 153 | +| `PHANTOM_PROVIDER_BASE_URL` | Override `provider.base_url` (validated as a URL) | |
| 154 | +| `PHANTOM_MODEL` | Override `config.model` | |
| 155 | + |
| 156 | +These are applied on top of the YAML-loaded config during startup. |
| 157 | + |
| 158 | +## How It Works |
| 159 | + |
| 160 | +The Claude Agent SDK runs as a subprocess. The SDK's bundled `cli.js` reads `ANTHROPIC_BASE_URL` and the `ANTHROPIC_DEFAULT_*_MODEL` aliases at call time. When `ANTHROPIC_BASE_URL` points at a non-Anthropic host, all Messages API requests go there instead. |
| 161 | + |
| 162 | +The `provider:` block is translated into those environment variables by `buildProviderEnv()` in [`src/config/providers.ts`](../src/config/providers.ts). The resulting map is merged into both the main agent query and the evolution judge query, so changing providers flips both tiers in lockstep. |
| 163 | + |
| 164 | +## Why keep a Claude model name in `model:`? |
| 165 | + |
| 166 | +The bundled `cli.js` has hardcoded model-name arrays for capability detection (thinking tokens, effort levels, compaction, etc.). Passing a literal `glm-5.1` as the model can break those checks. The recommended pattern is: |
| 167 | + |
| 168 | +1. Set `model: claude-sonnet-4-6` (or Opus) in `phantom.yaml` so `cli.js` treats the call as a known Claude model |
| 169 | +2. Set `model_mappings.sonnet: glm-5.1` in the provider block so the wire call goes to GLM-5.1 |
| 170 | + |
| 171 | +This is the same pattern Z.AI's own documentation recommends. |
| 172 | + |
| 173 | +## Troubleshooting |
| 174 | + |
| 175 | +**Phantom responds but the logs show Claude-shaped costs.** |
| 176 | +The bundled `cli.js` calculates `total_cost_usd` from its local Claude pricing table based on the model name string. Cost reporting is not provider-aware, so the logged cost will look like Claude pricing even when the request went to Z.AI or another provider. The actual charge on your provider's bill will differ. |
| 177 | + |
| 178 | +**Auto mode judges fall back to heuristic mode.** |
| 179 | +`resolveJudgeMode` in auto mode enables LLM judges when any of these are true: (a) a non-anthropic provider is configured, (b) `provider.base_url` is set, (c) `ANTHROPIC_API_KEY` is present, or (d) `~/.claude/.credentials.json` exists. If none hold, judges run in heuristic mode. Set `judges.enabled: always` in `config/evolution.yaml` to force LLM judges on. |
| 180 | + |
| 181 | +**Third-party proxy rejects a beta header.** |
| 182 | +`disable_betas: true` is already the default for every non-anthropic preset, which sets `CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1`. If you still see beta header errors, explicitly set `disable_betas: true` on your provider block to make sure it overrides any custom `disable_betas: false`. |
| 183 | + |
| 184 | +**Tool calls fail with small local models.** |
| 185 | +Phantom's tool system assumes strong function-calling capability. Models like Qwen3-Coder and GLM-5.1 handle it well; smaller models often fail on complex multi-step tool chains. Test with a strong model first, then drop down. |
| 186 | + |
| 187 | +**Subprocess fails with a missing-credential error.** |
| 188 | +Phantom does not validate credentials at load time. The subprocess only sees the provider env vars when a query runs. If `api_key_env` names a variable that is not set in the process environment, the subprocess will fail at call time with the provider's own error message. |
0 commit comments