Fix release branch by JimFuller-RedHat · Pull Request #2372 · guacsec/trustify

JimFuller-RedHat · 2026-05-29T10:24:16Z

Summary by Sourcery

Ensure stable cross-SBOM graph traversal and cycle detection under cache eviction while avoiding redundant external SBOM loads.

New Features:

Introduce per-SBOM tracking in collectors so discovered-node state is keyed by SBOM identifiers rather than graph pointer identity.
Add an in-memory cache for external SBOM graphs in the collector with metrics to track redundant load attempts.

Enhancements:

Propagate SBOM identifiers through collector traversal to maintain correct visit tracking across external graph boundaries.
Instrument graph loading with tracing for better observability of analysis operations.

Tests:

Add integration tests to verify cross-SBOM cycle detection remains consistent under varying cache sizes.
Add tests to ensure repeated loading of the same SBOM returns the same shared graph handle even when cache eviction occurs.
Add CycloneDX test SBOMs covering cross-SBOM loops and fan-out external dependency scenarios to support new tests.

When multiple external references within the same collection tree point to the same SBOM, the global Moka cache may have evicted it between lookups — causing redundant database loads. The Collector now maintains a local cache of loaded graphs, shared across recursive calls, so the same SBOM is loaded at most once per collection. A race between concurrent branches is detected via the Entry API and tracked with the `collector_redundant_loads` OTel counter. Co-authored-by: Claude <noreply@anthropic.com>

In the case that SBOMs get evicted while loading, the DiscoveredTracker might miss those SBOMs (as seen) due to the fact that the pointer/handle is a new one, which it indeed didn't see before. Using the ID of the SBOM fixes this. Co-authored-by: Claude <noreply@anthropic.com>

sourcery-ai · 2026-05-29T10:24:23Z

Reviewer's Guide

Refactors graph discovery and loading to be keyed by SBOM UUID instead of graph pointers, adds a per-collector in-memory cache for external SBOM graphs to avoid duplicate loads, instruments redundant loads, and introduces tests and fixtures to verify correct cycle detection and cache behavior under eviction pressure.

Sequence diagram for cached external SBOM graph loading

sequenceDiagram
    participant Collector
    participant LoadedGraphs as LoadedGraphsCache
    participant GraphLoader
    participant Connection

    Collector->>LoadedGraphs: load_external_graph(sbom_id)
    LoadedGraphs-->>Collector: [sbom_id present]
    alt graph in local cache
        Collector-->>Collector: return Some(graph)
    else graph not in local cache
        Collector->>GraphLoader: load(connection, sbom_id)
        GraphLoader->>Connection: query
        Connection-->>GraphLoader: graph or None
        alt graph is None
            GraphLoader-->>Collector: Ok(None)
        else graph loaded
            Collector->>LoadedGraphs: entry(sbom_id)
            alt Entry::Occupied
                Collector-->>Collector: log::debug(...)
                Collector->>GraphLoader: redundant_loads.add(1, [])
                LoadedGraphs-->>Collector: existing graph
            else Entry::Vacant
                LoadedGraphs-->>Collector: insert(graph)
            end
            Collector-->>Collector: return Ok(Some(graph))
        end
    end

File-Level Changes

Change	Details	Files
Track discovered nodes per SBOM UUID instead of NodeGraph pointer and propagate SBOM context through Collector.	Replace DiscoveredTracker cache key from raw NodeGraph pointers to Uuid and update visit signature to accept sbom_id. Add sbom_id field to Collector, plumb it through constructors, cloning helpers, and with() method, and use it when calling DiscoveredTracker::visit. Update AnalysisService construction of Collectors to pass the node's sbom_id for both ancestor and descendant traversals.	`modules/analysis/src/service/collector.rs` `modules/analysis/src/service/mod.rs`
Introduce a per-Collector cache for external SBOM graphs and centralize loading with duplicate-load detection metrics.	Add loaded_graphs mutex map to Collector to cache loaded external PackageGraph instances keyed by SBOM UUID. Implement load_external_graph to check the local cache, load via GraphLoader if missing, and coalesce concurrent loads using HashMap::entry with a redundant_loads counter and debug logging. Replace direct GraphLoader::load calls in traversal code paths with load_external_graph, ensuring sbom_id is passed when recursing into external graphs via with(). Extend GraphLoader with a redundant_loads metric counter, initialize it from an OpenTelemetry meter, and instrument load() with a tracing span.	`modules/analysis/src/service/collector.rs`
Add regression tests and fixtures to validate cross-SBOM cycle detection and stable Arc handles under cache eviction.	Register a new cache_eviction test module in the analysis service test harness. Add test verifying cross-SBOM cycles are detected consistently with large and tiny cache sizes by comparing warning counts and totals for a looped CycloneDX pair. Add test ensuring loading the same SBOM twice under cache pressure returns the same Arc by comparing Arc::ptr_eq on the returned graphs. Introduce CycloneDX test SBOM fixtures for loop-external and fan-out-external scenarios used by the new tests.	`modules/analysis/src/service/test/mod.rs` `modules/analysis/src/service/test/cache_eviction.rs` `etc/test-data/cyclonedx/fan-out-external/container.json` `etc/test-data/cyclonedx/fan-out-external/target.json` `etc/test-data/cyclonedx/loop-external/a.json` `etc/test-data/cyclonedx/loop-external/b.json`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The new loaded_graphs cache is protected by a single Mutex<HashMap<Uuid, Arc<PackageGraph>>>, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads.
GraphLoader::new creates a Meter and counter for every instance; if GraphLoader is constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `loaded_graphs` cache is protected by a single `Mutex<HashMap<Uuid, Arc<PackageGraph>>>`, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads.
- `GraphLoader::new` creates a `Meter` and counter for every instance; if `GraphLoader` is constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.

## Individual Comments

### Comment 1
<location path="modules/analysis/src/service/test/cache_eviction.rs" line_range="23-32" />
<code_context>
+async fn test_cache_eviction_cross_sbom_cycle_detection(
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.

Currently this only asserts that `warnings_large.len()` == `warnings_tiny.len()` and `result_large.total == result_tiny.total`, which checks consistency under cache pressure but not that cycles are actually detected.

To better verify the intended behaviour, please:
- Assert that `warnings_large` (and/or `warnings_tiny`) is non-empty so at least one cycle is detected.
- Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).
- Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.

This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.

Suggested implementation:

```rust
    // ensure we actually detected at least one cycle warning in both cache scenarios
    assert!(
        !warnings_large.is_empty(),
        "expected at least one warning with large cache, but none were produced"
    );
    assert!(
        !warnings_tiny.is_empty(),
        "expected at least one warning with tiny cache, but none were produced"
    );

    // existing assertions comparing warnings and results under different cache sizes
    assert_eq!(warnings_large.len(), warnings_tiny.len());
    assert_eq!(result_large.total, result_tiny.total);

```

If your warning type carries structured data (e.g. a `kind` or `code` field for cycle warnings), you can further strengthen this test by:
1. Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
   `assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle));`
2. Optionally asserting that the warning *contents* match between the large and tiny cache runs (e.g. by comparing IDs or messages).
These additional checks should be implemented using the actual fields and enums available on your warning type.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-29T10:31:05Z

+async fn test_cache_eviction_cross_sbom_cycle_detection(
+    ctx: &TrustifyContext,
+) -> Result<(), anyhow::Error> {
+    ctx.ingest_documents([
+        "cyclonedx/loop-external/a.json",
+        "cyclonedx/loop-external/b.json",
+    ])
+    .await?;
+
+    let options = QueryOptions {


suggestion (testing): Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.

Currently this only asserts that warnings_large.len() == warnings_tiny.len() and result_large.total == result_tiny.total, which checks consistency under cache pressure but not that cycles are actually detected.

To better verify the intended behaviour, please:

Assert that warnings_large (and/or warnings_tiny) is non-empty so at least one cycle is detected.

Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).

Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.

This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.

Suggested implementation:

// ensure we actually detected at least one cycle warning in both cache scenarios assert!( !warnings_large.is_empty(), "expected at least one warning with large cache, but none were produced" ); assert!( !warnings_tiny.is_empty(), "expected at least one warning with tiny cache, but none were produced" ); // existing assertions comparing warnings and results under different cache sizes assert_eq!(warnings_large.len(), warnings_tiny.len()); assert_eq!(result_large.total, result_tiny.total);

If your warning type carries structured data (e.g. a kind or code field for cycle warnings), you can further strengthen this test by:

Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle));

Optionally asserting that the warning contents match between the large and tiny cache runs (e.g. by comparing IDs or messages).
These additional checks should be implemented using the actual fields and enums available on your warning type.

ctron and others added 3 commits May 29, 2026 11:37

clippy: fix pagined total field

32d9e8c

JimFuller-RedHat self-assigned this May 29, 2026

github-project-automation Bot added this to Trustify May 29, 2026

sourcery-ai Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix release branch#2372

Fix release branch#2372
JimFuller-RedHat wants to merge 3 commits into
release/0.4.zfrom
fix-release-branch

JimFuller-RedHat commented May 29, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 29, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JimFuller-RedHat commented May 29, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for cached external SBOM graph loading

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JimFuller-RedHat commented May 29, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 29, 2026 •

edited

Loading