You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using models like moonshot/kimi-k2.5 or deepseek/deepseek-v3.2 through a custom base_url (e.g., Baidu Qianfan gateway), the SDK's auto-inferred effective_max_output_tokens exceeds the model's actual API limit, causing BadRequestError on every request.
MoonshotException - parameter check failed, max_tokens range is [1, 98304]
DeepseekException - parameter check failed, max_completion_tokens range is [1, 65536]
Root Cause
In openhands/sdk/llm/llm.py (lines 1249-1330), when the user does not set max_output_tokens, the SDK falls back to self._model_info from litellm's model database. For these models, the litellm metadata is inaccurate:
Model
SDK inferred effective_max_output_tokens
Actual API limit
Result
moonshot/kimi-k2.5
131072
98304
BadRequestError
deepseek/deepseek-v3.2
81920
65536
BadRequestError
This is the same class of bug as #2247 (Bedrock models with incorrect litellm metadata), but that fix only guarded bedrock/-prefixed models. Non-Bedrock models routed through custom base_url gateways are still affected.
Additionally, the SDK's default extended_thinking_budget=200000 is passed as max_tokens for models that don't support extended thinking, which also exceeds limits (e.g., kimi-k2.5 max 98304).
Steps to Reproduce
fromopenhands.sdk.llm.llmimportLLMllm=LLM(
model="moonshot/kimi-k2.5",
base_url="https://qianfan.baidubce.com/v2/coding",
api_key="sk-...",
reasoning_effort="medium",
)
print(llm.effective_max_output_tokens)
# Output: 131072 — but the model's actual max_tokens limit is 98304
Workaround
Explicitly set max_output_tokens and extended_thinking_budget when constructing LLM:
The guard added in #2264 for Bedrock should be generalized to all providers: when litellm's reported max_output_tokens cannot be verified as accurate for the actual endpoint, the SDK should either:
Apply the existing DEFAULT_MAX_OUTPUT_TOKENS_CAP (16384) as a universal safety cap, or
Problem
When using models like
moonshot/kimi-k2.5ordeepseek/deepseek-v3.2through a custombase_url(e.g., Baidu Qianfan gateway), the SDK's auto-inferredeffective_max_output_tokensexceeds the model's actual API limit, causingBadRequestErroron every request.Root Cause
In
openhands/sdk/llm/llm.py(lines 1249-1330), when the user does not setmax_output_tokens, the SDK falls back toself._model_infofrom litellm's model database. For these models, the litellm metadata is inaccurate:effective_max_output_tokensmoonshot/kimi-k2.5deepseek/deepseek-v3.2This is the same class of bug as #2247 (Bedrock models with incorrect litellm metadata), but that fix only guarded
bedrock/-prefixed models. Non-Bedrock models routed through custombase_urlgateways are still affected.Additionally, the SDK's default
extended_thinking_budget=200000is passed asmax_tokensfor models that don't support extended thinking, which also exceeds limits (e.g., kimi-k2.5 max 98304).Steps to Reproduce
Workaround
Explicitly set
max_output_tokensandextended_thinking_budgetwhen constructingLLM:Suggested Fix
The guard added in #2264 for Bedrock should be generalized to all providers: when litellm's reported
max_output_tokenscannot be verified as accurate for the actual endpoint, the SDK should either:DEFAULT_MAX_OUTPUT_TOKENS_CAP(16384) as a universal safety cap, ormax_completion_tokens/max_tokensentirely and let the provider use its default (as suggested in Bedrock models fail when litellm reports max_output_tokens equal to context window size #2247).Environment