Fix release branch#2372
Conversation
When multiple external references within the same collection tree point to the same SBOM, the global Moka cache may have evicted it between lookups — causing redundant database loads. The Collector now maintains a local cache of loaded graphs, shared across recursive calls, so the same SBOM is loaded at most once per collection. A race between concurrent branches is detected via the Entry API and tracked with the `collector_redundant_loads` OTel counter. Co-authored-by: Claude <noreply@anthropic.com>
In the case that SBOMs get evicted while loading, the DiscoveredTracker might miss those SBOMs (as seen) due to the fact that the pointer/handle is a new one, which it indeed didn't see before. Using the ID of the SBOM fixes this. Co-authored-by: Claude <noreply@anthropic.com>
Reviewer's GuideRefactors graph discovery and loading to be keyed by SBOM UUID instead of graph pointers, adds a per-collector in-memory cache for external SBOM graphs to avoid duplicate loads, instruments redundant loads, and introduces tests and fixtures to verify correct cycle detection and cache behavior under eviction pressure. Sequence diagram for cached external SBOM graph loadingsequenceDiagram
participant Collector
participant LoadedGraphs as LoadedGraphsCache
participant GraphLoader
participant Connection
Collector->>LoadedGraphs: load_external_graph(sbom_id)
LoadedGraphs-->>Collector: [sbom_id present]
alt graph in local cache
Collector-->>Collector: return Some(graph)
else graph not in local cache
Collector->>GraphLoader: load(connection, sbom_id)
GraphLoader->>Connection: query
Connection-->>GraphLoader: graph or None
alt graph is None
GraphLoader-->>Collector: Ok(None)
else graph loaded
Collector->>LoadedGraphs: entry(sbom_id)
alt Entry::Occupied
Collector-->>Collector: log::debug(...)
Collector->>GraphLoader: redundant_loads.add(1, [])
LoadedGraphs-->>Collector: existing graph
else Entry::Vacant
LoadedGraphs-->>Collector: insert(graph)
end
Collector-->>Collector: return Ok(Some(graph))
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
loaded_graphscache is protected by a singleMutex<HashMap<Uuid, Arc<PackageGraph>>>, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads. GraphLoader::newcreates aMeterand counter for every instance; ifGraphLoaderis constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `loaded_graphs` cache is protected by a single `Mutex<HashMap<Uuid, Arc<PackageGraph>>>`, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads.
- `GraphLoader::new` creates a `Meter` and counter for every instance; if `GraphLoader` is constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.
## Individual Comments
### Comment 1
<location path="modules/analysis/src/service/test/cache_eviction.rs" line_range="23-32" />
<code_context>
+async fn test_cache_eviction_cross_sbom_cycle_detection(
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.
Currently this only asserts that `warnings_large.len()` == `warnings_tiny.len()` and `result_large.total == result_tiny.total`, which checks consistency under cache pressure but not that cycles are actually detected.
To better verify the intended behaviour, please:
- Assert that `warnings_large` (and/or `warnings_tiny`) is non-empty so at least one cycle is detected.
- Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).
- Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.
This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.
Suggested implementation:
```rust
// ensure we actually detected at least one cycle warning in both cache scenarios
assert!(
!warnings_large.is_empty(),
"expected at least one warning with large cache, but none were produced"
);
assert!(
!warnings_tiny.is_empty(),
"expected at least one warning with tiny cache, but none were produced"
);
// existing assertions comparing warnings and results under different cache sizes
assert_eq!(warnings_large.len(), warnings_tiny.len());
assert_eq!(result_large.total, result_tiny.total);
```
If your warning type carries structured data (e.g. a `kind` or `code` field for cycle warnings), you can further strengthen this test by:
1. Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
`assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle));`
2. Optionally asserting that the warning *contents* match between the large and tiny cache runs (e.g. by comparing IDs or messages).
These additional checks should be implemented using the actual fields and enums available on your warning type.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async fn test_cache_eviction_cross_sbom_cycle_detection( | ||
| ctx: &TrustifyContext, | ||
| ) -> Result<(), anyhow::Error> { | ||
| ctx.ingest_documents([ | ||
| "cyclonedx/loop-external/a.json", | ||
| "cyclonedx/loop-external/b.json", | ||
| ]) | ||
| .await?; | ||
|
|
||
| let options = QueryOptions { |
There was a problem hiding this comment.
suggestion (testing): Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.
Currently this only asserts that warnings_large.len() == warnings_tiny.len() and result_large.total == result_tiny.total, which checks consistency under cache pressure but not that cycles are actually detected.
To better verify the intended behaviour, please:
- Assert that
warnings_large(and/orwarnings_tiny) is non-empty so at least one cycle is detected. - Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).
- Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.
This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.
Suggested implementation:
// ensure we actually detected at least one cycle warning in both cache scenarios
assert!(
!warnings_large.is_empty(),
"expected at least one warning with large cache, but none were produced"
);
assert!(
!warnings_tiny.is_empty(),
"expected at least one warning with tiny cache, but none were produced"
);
// existing assertions comparing warnings and results under different cache sizes
assert_eq!(warnings_large.len(), warnings_tiny.len());
assert_eq!(result_large.total, result_tiny.total);If your warning type carries structured data (e.g. a kind or code field for cycle warnings), you can further strengthen this test by:
- Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle)); - Optionally asserting that the warning contents match between the large and tiny cache runs (e.g. by comparing IDs or messages).
These additional checks should be implemented using the actual fields and enums available on your warning type.
Summary by Sourcery
Ensure stable cross-SBOM graph traversal and cycle detection under cache eviction while avoiding redundant external SBOM loads.
New Features:
Enhancements:
Tests: