Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 14 additions & 9 deletions architecture/inference-routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ File: `proto/inference.proto`

Key messages:

- `SetClusterInferenceRequest` -- `provider_name` + `model_id` + optional `no_verify` override, with verification enabled by default
- `SetClusterInferenceResponse` -- `provider_name` + `model_id` + `version`
- `SetClusterInferenceRequest` -- `provider_name` + `model_id` + `timeout_secs` + optional `no_verify` override, with verification enabled by default
- `SetClusterInferenceResponse` -- `provider_name` + `model_id` + `timeout_secs` + `version`
- `GetInferenceBundleResponse` -- `repeated ResolvedRoute routes` + `revision` + `generated_at_ms`
- `ResolvedRoute` -- `name`, `base_url`, `protocols`, `api_key`, `model_id`, `provider_type`
- `ResolvedRoute` -- `name`, `base_url`, `protocols`, `api_key`, `model_id`, `provider_type`, `timeout_secs`

## Data Plane (Sandbox)

Expand All @@ -106,7 +106,7 @@ Files:
- `crates/openshell-sandbox/src/lib.rs` -- inference context initialization, route refresh
- `crates/openshell-sandbox/src/grpc_client.rs` -- `fetch_inference_bundle()`

In cluster mode, the sandbox starts a background refresh loop as soon as the inference context is created. The loop polls the gateway every 5 seconds by default (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS` override) and uses the bundle revision hash to skip no-op cache writes.
In cluster mode, the sandbox starts a background refresh loop as soon as the inference context is created. The loop polls the gateway every 5 seconds by default (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS` override) and uses the bundle revision hash to skip no-op cache writes. The revision hash covers all route fields including `timeout_secs`, so any configuration change (provider, model, or timeout) triggers a cache update on the next poll.

### Interception flow

Expand Down Expand Up @@ -143,7 +143,7 @@ If no pattern matches, the proxy returns `403 Forbidden` with `{"error": "connec
### Route cache

- `InferenceContext` holds a `Router`, the pattern list, and an `Arc<RwLock<Vec<ResolvedRoute>>>` route cache.
- In cluster mode, `spawn_route_refresh()` polls `GetInferenceBundle` every 30 seconds (`ROUTE_REFRESH_INTERVAL_SECS`). On failure, stale routes are kept.
- In cluster mode, `spawn_route_refresh()` polls `GetInferenceBundle` every 5 seconds (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`). On failure, stale routes are kept.
- In file mode (`--inference-routes`), routes load once at startup from YAML. No refresh task is spawned.
- In cluster mode, an empty initial bundle still enables the inference context so the refresh task can pick up later configuration.

Expand Down Expand Up @@ -209,9 +209,11 @@ File: `crates/openshell-router/src/mock.rs`

Routes with `mock://` scheme endpoints return canned responses without making HTTP requests. Mock responses are protocol-aware (OpenAI chat completion, OpenAI completion, Anthropic messages, or generic JSON). Mock routes include an `x-openshell-mock: true` response header.

### HTTP client
### Per-request timeout

The router uses a `reqwest::Client` with a 60-second timeout. Timeouts and connection failures map to `RouterError::UpstreamUnavailable`.
Each `ResolvedRoute` carries a `timeout` field (`Duration`). The `reqwest::Client` has no global timeout; instead, each outgoing request applies `.timeout(route.timeout)` on the request builder. When `timeout_secs` is `0` in the proto message, the default of 60 seconds is used (defined as `DEFAULT_ROUTE_TIMEOUT` in `config.rs`). Timeouts and connection failures map to `RouterError::UpstreamUnavailable`.

Timeout changes propagate dynamically to running sandboxes. The bundle revision hash includes `timeout_secs`, so when the timeout is updated via `openshell inference update --timeout`, the refresh loop detects the revision change and updates the route cache within one polling interval (5 seconds by default).

## Standalone Route File

Expand Down Expand Up @@ -297,13 +299,16 @@ The system route is stored as a separate `InferenceRoute` record in the gateway

Cluster inference commands:

- `openshell inference set --provider <name> --model <id>` -- configures user-facing cluster inference
- `openshell inference set --system --provider <name> --model <id>` -- configures system inference
- `openshell inference set --provider <name> --model <id> [--timeout <secs>]` -- configures user-facing cluster inference
- `openshell inference set --system --provider <name> --model <id> [--timeout <secs>]` -- configures system inference
- `openshell inference update [--provider <name>] [--model <id>] [--timeout <secs>]` -- updates individual fields without resetting others
- `openshell inference get` -- displays both user and system inference configuration
- `openshell inference get --system` -- displays only the system inference configuration

The `--provider` flag references a provider record name (not a provider type). The provider must already exist in the cluster and have a supported inference type (`openai`, `anthropic`, or `nvidia`).

The `--timeout` flag sets the per-request timeout in seconds for upstream inference calls. When omitted or set to `0`, the default of 60 seconds applies. Timeout changes propagate to running sandboxes within the route refresh interval (5 seconds by default).

Inference writes verify by default. `--no-verify` is the explicit opt-out for endpoints that are not up yet.

## Provider Discovery
Expand Down
14 changes: 13 additions & 1 deletion crates/openshell-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,10 @@ enum InferenceCommands {
/// Skip endpoint verification before saving the route.
#[arg(long)]
no_verify: bool,

/// Request timeout in seconds for inference calls (0 = default 60s).
#[arg(long, default_value_t = 0)]
timeout: u64,
},

/// Update gateway-level inference configuration (partial update).
Expand All @@ -957,6 +961,10 @@ enum InferenceCommands {
/// Skip endpoint verification before saving the route.
#[arg(long)]
no_verify: bool,

/// Request timeout in seconds for inference calls (0 = default 60s, unchanged if omitted).
#[arg(long)]
timeout: Option<u64>,
},

/// Get gateway-level inference provider and model.
Expand Down Expand Up @@ -2026,10 +2034,12 @@ async fn main() -> Result<()> {
model,
system,
no_verify,
timeout,
} => {
let route_name = if system { "sandbox-system" } else { "" };
run::gateway_inference_set(
endpoint, &provider, &model, route_name, no_verify, &tls,
endpoint, &provider, &model, route_name, no_verify, timeout,
&tls,
)
.await?;
}
Expand All @@ -2038,6 +2048,7 @@ async fn main() -> Result<()> {
model,
system,
no_verify,
timeout,
} => {
let route_name = if system { "sandbox-system" } else { "" };
run::gateway_inference_update(
Expand All @@ -2046,6 +2057,7 @@ async fn main() -> Result<()> {
model.as_deref(),
route_name,
no_verify,
timeout,
&tls,
)
.await?;
Expand Down
21 changes: 19 additions & 2 deletions crates/openshell-cli/src/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3481,6 +3481,7 @@ pub async fn gateway_inference_set(
model_id: &str,
route_name: &str,
no_verify: bool,
timeout_secs: u64,
tls: &TlsOptions,
) -> Result<()> {
let progress = if std::io::stdout().is_terminal() {
Expand All @@ -3504,6 +3505,7 @@ pub async fn gateway_inference_set(
route_name: route_name.to_string(),
verify: false,
no_verify,
timeout_secs,
})
.await;

Expand All @@ -3525,6 +3527,7 @@ pub async fn gateway_inference_set(
println!(" {} {}", "Provider:".dimmed(), configured.provider_name);
println!(" {} {}", "Model:".dimmed(), configured.model_id);
println!(" {} {}", "Version:".dimmed(), configured.version);
print_timeout(configured.timeout_secs);
if configured.validation_performed {
println!(" {}", "Validated Endpoints:".dimmed());
for endpoint in configured.validated_endpoints {
Expand All @@ -3540,11 +3543,12 @@ pub async fn gateway_inference_update(
model_id: Option<&str>,
route_name: &str,
no_verify: bool,
timeout_secs: Option<u64>,
tls: &TlsOptions,
) -> Result<()> {
if provider_name.is_none() && model_id.is_none() {
if provider_name.is_none() && model_id.is_none() && timeout_secs.is_none() {
return Err(miette::miette!(
"at least one of --provider or --model must be specified"
"at least one of --provider, --model, or --timeout must be specified"
));
}

Expand All @@ -3561,6 +3565,7 @@ pub async fn gateway_inference_update(

let provider = provider_name.unwrap_or(&current.provider_name);
let model = model_id.unwrap_or(&current.model_id);
let timeout = timeout_secs.unwrap_or(current.timeout_secs);

let progress = if std::io::stdout().is_terminal() {
let spinner = ProgressBar::new_spinner();
Expand All @@ -3582,6 +3587,7 @@ pub async fn gateway_inference_update(
route_name: route_name.to_string(),
verify: false,
no_verify,
timeout_secs: timeout,
})
.await;

Expand All @@ -3603,6 +3609,7 @@ pub async fn gateway_inference_update(
println!(" {} {}", "Provider:".dimmed(), configured.provider_name);
println!(" {} {}", "Model:".dimmed(), configured.model_id);
println!(" {} {}", "Version:".dimmed(), configured.version);
print_timeout(configured.timeout_secs);
if configured.validation_performed {
println!(" {}", "Validated Endpoints:".dimmed());
for endpoint in configured.validated_endpoints {
Expand Down Expand Up @@ -3639,6 +3646,7 @@ pub async fn gateway_inference_get(
println!(" {} {}", "Provider:".dimmed(), configured.provider_name);
println!(" {} {}", "Model:".dimmed(), configured.model_id);
println!(" {} {}", "Version:".dimmed(), configured.version);
print_timeout(configured.timeout_secs);
} else {
// Show both routes by default.
print_inference_route(&mut client, "Gateway inference", "").await;
Expand Down Expand Up @@ -3666,6 +3674,7 @@ async fn print_inference_route(
println!(" {} {}", "Provider:".dimmed(), configured.provider_name);
println!(" {} {}", "Model:".dimmed(), configured.model_id);
println!(" {} {}", "Version:".dimmed(), configured.version);
print_timeout(configured.timeout_secs);
}
Err(e) if e.code() == Code::NotFound => {
println!("{}", format!("{label}:").cyan().bold());
Expand All @@ -3680,6 +3689,14 @@ async fn print_inference_route(
}
}

fn print_timeout(timeout_secs: u64) {
if timeout_secs == 0 {
println!(" {} {}s (default)", "Timeout:".dimmed(), 60);
} else {
println!(" {} {}s", "Timeout:".dimmed(), timeout_secs);
}
}

fn format_inference_status(status: Status) -> miette::Report {
let message = status.message().trim();

Expand Down
3 changes: 2 additions & 1 deletion crates/openshell-router/src/backend.rs
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ async fn send_backend_request(
}
Err(_) => body,
};
builder = builder.body(body);
builder = builder.body(body).timeout(route.timeout);

builder.send().await.map_err(|e| {
if e.is_timeout() {
Expand Down Expand Up @@ -468,6 +468,7 @@ mod tests {
protocols: protocols.iter().map(|p| (*p).to_string()).collect(),
auth,
default_headers: vec![("anthropic-version".to_string(), "2023-06-01".to_string())],
timeout: crate::config::DEFAULT_ROUTE_TIMEOUT,
}
}

Expand Down
8 changes: 8 additions & 0 deletions crates/openshell-router/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,14 @@

use serde::Deserialize;
use std::path::Path;
use std::time::Duration;

pub use openshell_core::inference::AuthHeader;

use crate::RouterError;

pub const DEFAULT_ROUTE_TIMEOUT: Duration = Duration::from_secs(60);

#[derive(Debug, Clone, Deserialize)]
pub struct RouterConfig {
pub routes: Vec<RouteConfig>,
Expand Down Expand Up @@ -45,6 +48,8 @@ pub struct ResolvedRoute {
pub auth: AuthHeader,
/// Extra headers injected on every request (e.g. `anthropic-version`).
pub default_headers: Vec<(String, String)>,
/// Per-request timeout for proxied inference calls.
pub timeout: Duration,
}

impl std::fmt::Debug for ResolvedRoute {
Expand All @@ -57,6 +62,7 @@ impl std::fmt::Debug for ResolvedRoute {
.field("protocols", &self.protocols)
.field("auth", &self.auth)
.field("default_headers", &self.default_headers)
.field("timeout", &self.timeout)
.finish()
}
}
Expand Down Expand Up @@ -129,6 +135,7 @@ impl RouteConfig {
protocols,
auth,
default_headers,
timeout: DEFAULT_ROUTE_TIMEOUT,
})
}
}
Expand Down Expand Up @@ -256,6 +263,7 @@ routes:
protocols: vec!["openai_chat_completions".to_string()],
auth: AuthHeader::Bearer,
default_headers: Vec::new(),
timeout: DEFAULT_ROUTE_TIMEOUT,
};
let debug_output = format!("{route:?}");
assert!(
Expand Down
3 changes: 0 additions & 3 deletions crates/openshell-router/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ mod backend;
pub mod config;
mod mock;

use std::time::Duration;

pub use backend::{
ProxyResponse, StreamingProxyResponse, ValidatedEndpoint, ValidationFailure,
ValidationFailureKind, verify_backend_endpoint,
Expand Down Expand Up @@ -39,7 +37,6 @@ pub struct Router {
impl Router {
pub fn new() -> Result<Self, RouterError> {
let client = reqwest::Client::builder()
.timeout(Duration::from_secs(60))
.build()
.map_err(|e| RouterError::Internal(format!("failed to build HTTP client: {e}")))?;
Ok(Self {
Expand Down
1 change: 1 addition & 0 deletions crates/openshell-router/src/mock.rs
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ mod tests {
protocols: protocols.iter().map(ToString::to_string).collect(),
auth: crate::config::AuthHeader::Bearer,
default_headers: Vec::new(),
timeout: crate::config::DEFAULT_ROUTE_TIMEOUT,
}
}

Expand Down
6 changes: 6 additions & 0 deletions crates/openshell-router/tests/backend_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ fn mock_candidates(base_url: &str) -> Vec<ResolvedRoute> {
protocols: vec!["openai_chat_completions".to_string()],
auth: AuthHeader::Bearer,
default_headers: Vec::new(),
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}]
}

Expand Down Expand Up @@ -117,6 +118,7 @@ async fn proxy_no_compatible_route_returns_error() {
protocols: vec!["anthropic_messages".to_string()],
auth: AuthHeader::Custom("x-api-key"),
default_headers: Vec::new(),
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}];

let err = router
Expand Down Expand Up @@ -178,6 +180,7 @@ async fn proxy_mock_route_returns_canned_response() {
protocols: vec!["openai_chat_completions".to_string()],
auth: AuthHeader::Bearer,
default_headers: Vec::new(),
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}];

let body = serde_json::to_vec(&serde_json::json!({
Expand Down Expand Up @@ -312,6 +315,7 @@ async fn proxy_uses_x_api_key_for_anthropic_route() {
protocols: vec!["anthropic_messages".to_string()],
auth: AuthHeader::Custom("x-api-key"),
default_headers: vec![("anthropic-version".to_string(), "2023-06-01".to_string())],
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}];

let body = serde_json::to_vec(&serde_json::json!({
Expand Down Expand Up @@ -370,6 +374,7 @@ async fn proxy_anthropic_does_not_send_bearer_auth() {
protocols: vec!["anthropic_messages".to_string()],
auth: AuthHeader::Custom("x-api-key"),
default_headers: vec![("anthropic-version".to_string(), "2023-06-01".to_string())],
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}];

let response = router
Expand Down Expand Up @@ -414,6 +419,7 @@ async fn proxy_forwards_client_anthropic_version_header() {
protocols: vec!["anthropic_messages".to_string()],
auth: AuthHeader::Custom("x-api-key"),
default_headers: vec![("anthropic-version".to_string(), "2023-06-01".to_string())],
timeout: openshell_router::config::DEFAULT_ROUTE_TIMEOUT,
}];

let body = serde_json::to_vec(&serde_json::json!({
Expand Down
Loading
Loading