Skip to content

Allow setting custom inference timeout#656

Closed
pentschev wants to merge 3 commits intoNVIDIA:mainfrom
pentschev:inference-timeout
Closed

Allow setting custom inference timeout#656
pentschev wants to merge 3 commits intoNVIDIA:mainfrom
pentschev:inference-timeout

Conversation

@pentschev
Copy link
Copy Markdown

Summary

Makes the inference routing timeout configurable via openshell inference set --timeout <secs> and openshell inference update --timeout <secs>, replacing the hardcoded 60-second default. Timeout changes propagate dynamically to running sandboxes within the route refresh interval (~5 seconds) without requiring sandbox recreation.

The timeout was observed running OpenCode for a complex build task on a DGX Spark running nemotron-3-super:120b via Ollama, this feature allows longer running tasks to succeed.

Related Issue

Closes #641

Changes

  • Add timeout_secs field to ClusterInferenceConfig, SetClusterInferenceRequest, SetClusterInferenceResponse, GetClusterInferenceResponse, and ResolvedRoute proto messages
  • Add timeout field (Duration) to the router's ResolvedRoute struct with a DEFAULT_ROUTE_TIMEOUT of 60 seconds
  • Remove the global reqwest::Client timeout; apply per-request .timeout(route.timeout) in backend.rs
  • Thread timeout_secs through server persistence (upsert_cluster_inference_route, build_cluster_inference_config, bundle resolution)
  • Map proto timeout_secs to router ResolvedRoute.timeout in the sandbox's bundle_to_resolved_routes()
  • Include timeout_secs in the bundle revision hash so timeout changes trigger route cache refreshes in running sandboxes
  • Add --timeout CLI flag to inference set (default 0 = 60s) and inference update (optional)
  • Update docs/inference/configure.md with timeout usage and hot-reload behavior
  • Update architecture/inference-routing.md with per-request timeout semantics, proto field additions, and CLI surface

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@pentschev pentschev requested a review from a team as a code owner March 29, 2026 10:35
@github-actions
Copy link
Copy Markdown

Thank you for your interest in contributing to OpenShell, @pentschev.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@github-actions github-actions bot closed this Mar 29, 2026
@pentschev
Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(proxy): 60s reqwest total timeout kills streaming inference responses mid-generation

1 participant