Conversation
Greptile SummaryThis PR introduces an in-memory cache for the Key issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant ModelsRoute as /v1/models handler
participant Cache as static_models_cache (RwLock)
participant Provider as Upstream Provider(s)
participant BgTask as Background Refresh Task
Note over BgTask: Spawned at server start (task_tracker)
BgTask->>Provider: list_models_for_config (parallel)
Provider-->>BgTask: ModelsResponse / Err
BgTask->>Cache: write lock → retain + insert
Client->>ModelsRoute: GET /v1/models
ModelsRoute->>Cache: read lock → check each provider
alt All hits
Cache-->>ModelsRoute: cached ModelsResponse × N
else Some misses
Cache-->>ModelsRoute: hits (cached)
ModelsRoute->>Provider: live-fetch miss providers (parallel)
Provider-->>ModelsRoute: ModelsResponse / Err (warn on err)
Note over ModelsRoute: ⚠️ results NOT written back to Cache
end
ModelsRoute-->>Client: CombinedModelsResponse (enriched)
loop Every refresh_interval_secs (default 300s)
BgTask->>Provider: warm_static_models_cache
Provider-->>BgTask: updated responses
BgTask->>Cache: write lock → retain + insert
end
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/cli/server.rs
Line: 344-351
Comment:
**Infinite loop blocks graceful shutdown for full 30-second timeout**
The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.
Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.
The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown:
```rust
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
let mut ticker = tokio::time::interval(interval);
ticker.tick().await;
loop {
tokio::select! {
_ = cancel.cancelled() => break,
_ = ticker.tick() => {}
}
state_ref.warm_static_models_cache().await;
}
});
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/routes/api/models.rs
Line: 76-81
Comment:
**Successful live-fetch results not written back to cache**
When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.
Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.
Consider writing successful live-fetch results back to the cache:
```rust
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
match result {
Ok(resp) => {
live_fetched.push((name.clone(), resp.clone()));
hits.push((name, resp));
}
Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
}
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
let mut cache = state.static_models_cache.write().await;
for (name, resp) in live_fetched {
cache.insert(name, resp);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/wasm.rs
Line: 159
Comment:
**WASM path: cache always empty despite feature being enabled by default**
The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch).
If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:
```rust
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },
// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
state.warm_static_models_cache().await;
}
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "Review fixes" |
src/cli/server.rs
Outdated
| task_tracker.spawn(async move { | ||
| let mut ticker = tokio::time::interval(interval); | ||
| ticker.tick().await; // skip the immediate first tick (already warmed) | ||
| loop { | ||
| ticker.tick().await; | ||
| state_ref.warm_static_models_cache().await; | ||
| } | ||
| }); |
There was a problem hiding this comment.
Infinite loop blocks graceful shutdown for full 30-second timeout
The background refresh task runs an unconditional loop { ... } with no exit condition. The server's shutdown handler calls task_tracker.close() followed by task_tracker.wait() with a 30-second timeout (tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.
Every other task_tracker.spawn usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.
The fix is to drive the loop via a CancellationToken that is signalled during shutdown:
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
let mut ticker = tokio::time::interval(interval);
ticker.tick().await;
loop {
tokio::select! {
_ = cancel.cancelled() => break,
_ = ticker.tick() => {}
}
state_ref.warm_static_models_cache().await;
}
});Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/server.rs
Line: 344-351
Comment:
**Infinite loop blocks graceful shutdown for full 30-second timeout**
The background refresh task runs an unconditional `loop { ... }` with no exit condition. The server's shutdown handler calls `task_tracker.close()` followed by `task_tracker.wait()` with a **30-second timeout** (`tokio::time::timeout(std::time::Duration::from_secs(30), task_tracker.wait())`). Since this task never exits on its own, every server shutdown will stall for the entire 30 seconds before timing out, regardless of whether the task is doing useful work at that moment.
Every other `task_tracker.spawn` usage in the codebase wraps short-lived, finite work (audit logging, usage flushing, etc.). This is the only infinite background loop registered with the tracker, and it breaks the graceful-shutdown contract the tracker is designed to enforce.
The fix is to drive the loop via a [`CancellationToken`](https://docs.rs/tokio-util/latest/tokio_util/sync/struct.CancellationToken.html) that is signalled during shutdown:
```rust
let cancel = state.cancellation_token.clone(); // or create a new one shared with shutdown
let state_ref = state.clone();
task_tracker.spawn(async move {
let mut ticker = tokio::time::interval(interval);
ticker.tick().await;
loop {
tokio::select! {
_ = cancel.cancelled() => break,
_ = ticker.tick() => {}
}
state_ref.warm_static_models_cache().await;
}
});
```
How can I resolve this? If you propose a fix, please make it concise.| for (name, result) in join_all(futures).await { | ||
| match result { | ||
| Ok(resp) => hits.push((name, resp)), | ||
| Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Successful live-fetch results not written back to cache
When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to hits for this single request but is never stored in static_models_cache. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.
Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to /v1/models will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.
Consider writing successful live-fetch results back to the cache:
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
match result {
Ok(resp) => {
live_fetched.push((name.clone(), resp.clone()));
hits.push((name, resp));
}
Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
}
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
let mut cache = state.static_models_cache.write().await;
for (name, resp) in live_fetched {
cache.insert(name, resp);
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/api/models.rs
Line: 76-81
Comment:
**Successful live-fetch results not written back to cache**
When a cache miss triggers a live-fetch and the fetch succeeds, the result is added to `hits` for this single request but is never stored in `static_models_cache`. The next request for the same provider will be another cache miss and trigger yet another live-fetch, until the background refresh (up to 5 minutes later by default) eventually populates the entry.
Under concurrent load this becomes a thundering-herd problem: if the initial cache warm fails for a provider, every concurrent request to `/v1/models` will independently live-fetch from that same upstream provider until the background refresh catches up. This negates the latency benefit the cache is meant to provide.
Consider writing successful live-fetch results back to the cache:
```rust
let mut live_fetched: Vec<(String, crate::providers::ModelsResponse)> = Vec::new();
for (name, result) in join_all(futures).await {
match result {
Ok(resp) => {
live_fetched.push((name.clone(), resp.clone()));
hits.push((name, resp));
}
Err(e) => tracing::warn!(provider = %name, error = %e, "Live-fetch fallback failed for cache-miss provider"),
}
}
// Write successful live-fetches back to the cache so future requests benefit
if !live_fetched.is_empty() {
let mut cache = state.static_models_cache.write().await;
for (name, resp) in live_fetched {
cache.insert(name, resp);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| default_org_id, | ||
| provider_metrics: Arc::new(services::ProviderMetricsService::new()), | ||
| model_catalog: catalog::ModelCatalogRegistry::new(), | ||
| static_models_cache: Arc::new(tokio::sync::RwLock::new(Default::default())), |
There was a problem hiding this comment.
WASM path: cache always empty despite feature being enabled by default
The WASM HadrianGateway constructs AppState directly (bypassing AppState::new), so warm_static_models_cache() is never called. Since StaticModelsCacheConfig defaults to refresh_interval_secs = 300, enabled() returns true. Every call to api_v1_models from WASM will enter the cache_enabled branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a misses list before falling through to live-fetch).
If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },
// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
state.warm_static_models_cache().await;
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/wasm.rs
Line: 159
Comment:
**WASM path: cache always empty despite feature being enabled by default**
The WASM `HadrianGateway` constructs `AppState` directly (bypassing `AppState::new`), so `warm_static_models_cache()` is never called. Since `StaticModelsCacheConfig` defaults to `refresh_interval_secs = 300`, `enabled()` returns `true`. Every call to `api_v1_models` from WASM will enter the `cache_enabled` branch, find all providers as cache misses, then live-fetch them — identical to the pre-PR behaviour but with extra overhead (acquiring a read lock and building a `misses` list before falling through to live-fetch).
If WASM is intentionally not expected to use this cache, consider either disabling caching by default in the WASM config, or calling the warm function from the WASM initializer:
```rust
// Option A: disable for WASM in config defaults
static_models_cache: StaticModelsCacheConfig { refresh_interval_secs: 0 },
// Option B: warm the cache in the WASM constructor (if async is available)
if state.config.features.static_models_cache.enabled() {
state.warm_static_models_cache().await;
}
```
How can I resolve this? If you propose a fix, please make it concise.
No description provided.