Skip to content

provider-openai Responses API breaks compatibility with local OpenAI-compatible servers (MLX, vLLM, etc.) #246

@Joi

Description

@Joi

Problem

provider-openai has migrated entirely to the OpenAI Responses API (/v1/responses). This breaks compatibility with all local servers that implement only the OpenAI Chat Completions API (/v1/chat/completions), including:

  • mlx_lm.server (MLX framework)
  • vLLM
  • llama.cpp server
  • LM Studio
  • LocalAI
  • Ollama's OpenAI compatibility mode

These servers are commonly used as base_url overrides in provider-openai to run local models.

Reproduction

# settings.yaml - local MLX provider
- config:
    api_key: local
    base_url: http://localhost:8080/v1
    default_model: shieldstackllc/Step-3.5-Flash-REAP-128B-A11B-mlx-mixed-4-6
    priority: 4
  instance_id: local
  module: provider-openai

When the routing matrix selects this provider, the session crashes with:

[PROVIDER] OpenAI API error: ReadError: (no message)
Error: Execution failed: LLMError: ReadError: (no message)

The ReadError occurs because provider-openai calls self.client.responses.stream() (line ~952) or self.client.responses.create() (line ~965), hitting /v1/responses which returns 404 on the local server.

Direct curl to /v1/chat/completions on the same server works perfectly:

curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"...","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
# Returns 200 with valid response

Root Cause

In provider-openai/__init__.py around line 947-965:

if self.use_streaming:
    async with self.client.responses.stream(**params) as stream:  # /v1/responses
        response = await stream.get_final_response()
else:
    return await self.client.responses.create(**params)  # also /v1/responses

Both streaming and non-streaming paths use the Responses API. There is no Chat Completions fallback.

Proposed Fix

Add a config option like use_responses_api: false (or auto-detect based on base_url being non-default) that falls back to self.client.chat.completions.create() when targeting local/compatible servers.

This would restore the local model use case that previously worked when provider-openai used Chat Completions.

Secondary issue: CLI -p flag doesn't support instance_id

Related: amplifier run -p local fails because the CLI resolves -p against the module field (finding provider-openai), not instance_id. When two providers share the same module (cloud OpenAI + local MLX), there's no way to target the second instance from the CLI.

Impact

Any user with local MLX/vLLM/llama.cpp models configured via base_url in provider-openai is broken. The routing matrix's local fallback candidates never work, giving a false sense of offline resilience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions