[Bug] SDK effective_max_output_tokens exceeds actual model limits for non-Bedrock providers (moonshot, deepseek via custom base_url)

## Problem

When using models like `moonshot/kimi-k2.5` or `deepseek/deepseek-v3.2` through a custom `base_url` (e.g., Baidu Qianfan gateway), the SDK's auto-inferred `effective_max_output_tokens` exceeds the model's actual API limit, causing `BadRequestError` on every request.

```
MoonshotException - parameter check failed, max_tokens range is [1, 98304]
DeepseekException - parameter check failed, max_completion_tokens range is [1, 65536]
```

## Root Cause

In `openhands/sdk/llm/llm.py` (lines 1249-1330), when the user does not set `max_output_tokens`, the SDK falls back to `self._model_info` from litellm's model database. For these models, the litellm metadata is inaccurate:

| Model | SDK inferred `effective_max_output_tokens` | Actual API limit | Result |
|---|---|---|---|
| `moonshot/kimi-k2.5` | 131072 | 98304 | **BadRequestError** |
| `deepseek/deepseek-v3.2` | 81920 | 65536 | **BadRequestError** |

This is the same class of bug as #2247 (Bedrock models with incorrect litellm metadata), but that fix only guarded `bedrock/`-prefixed models. Non-Bedrock models routed through custom `base_url` gateways are still affected.

Additionally, the SDK's default `extended_thinking_budget=200000` is passed as `max_tokens` for models that don't support extended thinking, which also exceeds limits (e.g., kimi-k2.5 max 98304).

## Steps to Reproduce

```python
from openhands.sdk.llm.llm import LLM

llm = LLM(
    model="moonshot/kimi-k2.5",
    base_url="https://qianfan.baidubce.com/v2/coding",
    api_key="sk-...",
    reasoning_effort="medium",
)
print(llm.effective_max_output_tokens)
# Output: 131072 — but the model's actual max_tokens limit is 98304
```

## Workaround

Explicitly set `max_output_tokens` and `extended_thinking_budget` when constructing `LLM`:

```python
llm = LLM(
    model="moonshot/kimi-k2.5",
    base_url="https://qianfan.baidubce.com/v2/coding",
    api_key="sk-...",
    reasoning_effort="medium",
    max_output_tokens=16384,
    extended_thinking_budget=None,
    enable_encrypted_reasoning=False,
)
```

## Suggested Fix

The guard added in #2264 for Bedrock should be generalized to all providers: when litellm's reported `max_output_tokens` cannot be verified as accurate for the actual endpoint, the SDK should either:

1. Apply the existing `DEFAULT_MAX_OUTPUT_TOKENS_CAP` (16384) as a universal safety cap, or
2. Omit `max_completion_tokens` / `max_tokens` entirely and let the provider use its default (as suggested in #2247).

## Environment

- openhands-sdk version: 1.22.1
- Python: 3.12
- litellm: bundled with openhands-sdk 1.22.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] SDK effective_max_output_tokens exceeds actual model limits for non-Bedrock providers (moonshot, deepseek via custom base_url) #3317

Problem

Root Cause

Steps to Reproduce

Workaround

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	SDK inferred `effective_max_output_tokens`	Actual API limit	Result
`moonshot/kimi-k2.5`	131072	98304	BadRequestError
`deepseek/deepseek-v3.2`	81920	65536	BadRequestError

[Bug] SDK effective_max_output_tokens exceeds actual model limits for non-Bedrock providers (moonshot, deepseek via custom base_url) #3317

Description

Problem

Root Cause

Steps to Reproduce

Workaround

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions