provider-openai Responses API breaks compatibility with local OpenAI-compatible servers (MLX, vLLM, etc.)

## Problem

`provider-openai` has migrated entirely to the OpenAI Responses API (`/v1/responses`). This breaks compatibility with all local servers that implement only the OpenAI Chat Completions API (`/v1/chat/completions`), including:
- **mlx_lm.server** (MLX framework)
- **vLLM**
- **llama.cpp server**
- **LM Studio**
- **LocalAI**
- **Ollama's OpenAI compatibility mode**

These servers are commonly used as `base_url` overrides in provider-openai to run local models.

## Reproduction

```yaml
# settings.yaml - local MLX provider
- config:
    api_key: local
    base_url: http://localhost:8080/v1
    default_model: shieldstackllc/Step-3.5-Flash-REAP-128B-A11B-mlx-mixed-4-6
    priority: 4
  instance_id: local
  module: provider-openai
```

When the routing matrix selects this provider, the session crashes with:
```
[PROVIDER] OpenAI API error: ReadError: (no message)
Error: Execution failed: LLMError: ReadError: (no message)
```

The `ReadError` occurs because `provider-openai` calls `self.client.responses.stream()` (line ~952) or `self.client.responses.create()` (line ~965), hitting `/v1/responses` which returns 404 on the local server.

Direct curl to `/v1/chat/completions` on the same server works perfectly:
```bash
curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"...","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
# Returns 200 with valid response
```

## Root Cause

In `provider-openai/__init__.py` around line 947-965:
```python
if self.use_streaming:
    async with self.client.responses.stream(**params) as stream:  # /v1/responses
        response = await stream.get_final_response()
else:
    return await self.client.responses.create(**params)  # also /v1/responses
```

Both streaming and non-streaming paths use the Responses API. There is no Chat Completions fallback.

## Proposed Fix

Add a config option like `use_responses_api: false` (or auto-detect based on `base_url` being non-default) that falls back to `self.client.chat.completions.create()` when targeting local/compatible servers.

This would restore the local model use case that previously worked when provider-openai used Chat Completions.

## Secondary issue: CLI -p flag doesn't support instance_id

Related: `amplifier run -p local` fails because the CLI resolves `-p` against the `module` field (finding `provider-openai`), not `instance_id`. When two providers share the same module (cloud OpenAI + local MLX), there's no way to target the second instance from the CLI.

## Impact

Any user with local MLX/vLLM/llama.cpp models configured via `base_url` in provider-openai is broken. The routing matrix's local fallback candidates never work, giving a false sense of offline resilience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provider-openai Responses API breaks compatibility with local OpenAI-compatible servers (MLX, vLLM, etc.) #246

Problem

Reproduction

Root Cause

Proposed Fix

Secondary issue: CLI -p flag doesn't support instance_id

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

provider-openai Responses API breaks compatibility with local OpenAI-compatible servers (MLX, vLLM, etc.) #246

Description

Problem

Reproduction

Root Cause

Proposed Fix

Secondary issue: CLI -p flag doesn't support instance_id

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions