Skip to content

Fix release branch#2372

Open
JimFuller-RedHat wants to merge 3 commits into
release/0.4.zfrom
fix-release-branch
Open

Fix release branch#2372
JimFuller-RedHat wants to merge 3 commits into
release/0.4.zfrom
fix-release-branch

Conversation

@JimFuller-RedHat
Copy link
Copy Markdown
Contributor

@JimFuller-RedHat JimFuller-RedHat commented May 29, 2026

Summary by Sourcery

Ensure stable cross-SBOM graph traversal and cycle detection under cache eviction while avoiding redundant external SBOM loads.

New Features:

  • Introduce per-SBOM tracking in collectors so discovered-node state is keyed by SBOM identifiers rather than graph pointer identity.
  • Add an in-memory cache for external SBOM graphs in the collector with metrics to track redundant load attempts.

Enhancements:

  • Propagate SBOM identifiers through collector traversal to maintain correct visit tracking across external graph boundaries.
  • Instrument graph loading with tracing for better observability of analysis operations.

Tests:

  • Add integration tests to verify cross-SBOM cycle detection remains consistent under varying cache sizes.
  • Add tests to ensure repeated loading of the same SBOM returns the same shared graph handle even when cache eviction occurs.
  • Add CycloneDX test SBOMs covering cross-SBOM loops and fan-out external dependency scenarios to support new tests.

ctron and others added 3 commits May 29, 2026 11:37
When multiple external references within the same collection tree point
to the same SBOM, the global Moka cache may have evicted it between
lookups — causing redundant database loads. The Collector now maintains
a local cache of loaded graphs, shared across recursive calls, so the
same SBOM is loaded at most once per collection.

A race between concurrent branches is detected via the Entry API and
tracked with the `collector_redundant_loads` OTel counter.

Co-authored-by: Claude <noreply@anthropic.com>
In the case that SBOMs get evicted while loading, the DiscoveredTracker
might miss those SBOMs (as seen) due to the fact that the pointer/handle
is a new one, which it indeed didn't see before. Using the ID of the
SBOM fixes this.

Co-authored-by: Claude <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 29, 2026

Reviewer's Guide

Refactors graph discovery and loading to be keyed by SBOM UUID instead of graph pointers, adds a per-collector in-memory cache for external SBOM graphs to avoid duplicate loads, instruments redundant loads, and introduces tests and fixtures to verify correct cycle detection and cache behavior under eviction pressure.

Sequence diagram for cached external SBOM graph loading

sequenceDiagram
    participant Collector
    participant LoadedGraphs as LoadedGraphsCache
    participant GraphLoader
    participant Connection

    Collector->>LoadedGraphs: load_external_graph(sbom_id)
    LoadedGraphs-->>Collector: [sbom_id present]
    alt graph in local cache
        Collector-->>Collector: return Some(graph)
    else graph not in local cache
        Collector->>GraphLoader: load(connection, sbom_id)
        GraphLoader->>Connection: query
        Connection-->>GraphLoader: graph or None
        alt graph is None
            GraphLoader-->>Collector: Ok(None)
        else graph loaded
            Collector->>LoadedGraphs: entry(sbom_id)
            alt Entry::Occupied
                Collector-->>Collector: log::debug(...)
                Collector->>GraphLoader: redundant_loads.add(1, [])
                LoadedGraphs-->>Collector: existing graph
            else Entry::Vacant
                LoadedGraphs-->>Collector: insert(graph)
            end
            Collector-->>Collector: return Ok(Some(graph))
        end
    end
Loading

File-Level Changes

Change Details Files
Track discovered nodes per SBOM UUID instead of NodeGraph pointer and propagate SBOM context through Collector.
  • Replace DiscoveredTracker cache key from raw NodeGraph pointers to Uuid and update visit signature to accept sbom_id.
  • Add sbom_id field to Collector, plumb it through constructors, cloning helpers, and with() method, and use it when calling DiscoveredTracker::visit.
  • Update AnalysisService construction of Collectors to pass the node's sbom_id for both ancestor and descendant traversals.
modules/analysis/src/service/collector.rs
modules/analysis/src/service/mod.rs
Introduce a per-Collector cache for external SBOM graphs and centralize loading with duplicate-load detection metrics.
  • Add loaded_graphs mutex map to Collector to cache loaded external PackageGraph instances keyed by SBOM UUID.
  • Implement load_external_graph to check the local cache, load via GraphLoader if missing, and coalesce concurrent loads using HashMap::entry with a redundant_loads counter and debug logging.
  • Replace direct GraphLoader::load calls in traversal code paths with load_external_graph, ensuring sbom_id is passed when recursing into external graphs via with().
  • Extend GraphLoader with a redundant_loads metric counter, initialize it from an OpenTelemetry meter, and instrument load() with a tracing span.
modules/analysis/src/service/collector.rs
Add regression tests and fixtures to validate cross-SBOM cycle detection and stable Arc handles under cache eviction.
  • Register a new cache_eviction test module in the analysis service test harness.
  • Add test verifying cross-SBOM cycles are detected consistently with large and tiny cache sizes by comparing warning counts and totals for a looped CycloneDX pair.
  • Add test ensuring loading the same SBOM twice under cache pressure returns the same Arc by comparing Arc::ptr_eq on the returned graphs.
  • Introduce CycloneDX test SBOM fixtures for loop-external and fan-out-external scenarios used by the new tests.
modules/analysis/src/service/test/mod.rs
modules/analysis/src/service/test/cache_eviction.rs
etc/test-data/cyclonedx/fan-out-external/container.json
etc/test-data/cyclonedx/fan-out-external/target.json
etc/test-data/cyclonedx/loop-external/a.json
etc/test-data/cyclonedx/loop-external/b.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new loaded_graphs cache is protected by a single Mutex<HashMap<Uuid, Arc<PackageGraph>>>, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads.
  • GraphLoader::new creates a Meter and counter for every instance; if GraphLoader is constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `loaded_graphs` cache is protected by a single `Mutex<HashMap<Uuid, Arc<PackageGraph>>>`, which may become a contention point under high concurrency; consider using a more granular structure (e.g., per-SBOM locks or a concurrent map) if you expect many concurrent graph loads.
- `GraphLoader::new` creates a `Meter` and counter for every instance; if `GraphLoader` is constructed frequently, it may be worth reusing a shared meter/counter or injecting it to avoid repeated metric registration overhead.

## Individual Comments

### Comment 1
<location path="modules/analysis/src/service/test/cache_eviction.rs" line_range="23-32" />
<code_context>
+async fn test_cache_eviction_cross_sbom_cycle_detection(
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.

Currently this only asserts that `warnings_large.len()` == `warnings_tiny.len()` and `result_large.total == result_tiny.total`, which checks consistency under cache pressure but not that cycles are actually detected.

To better verify the intended behaviour, please:
- Assert that `warnings_large` (and/or `warnings_tiny`) is non-empty so at least one cycle is detected.
- Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).
- Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.

This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.

Suggested implementation:

```rust
    // ensure we actually detected at least one cycle warning in both cache scenarios
    assert!(
        !warnings_large.is_empty(),
        "expected at least one warning with large cache, but none were produced"
    );
    assert!(
        !warnings_tiny.is_empty(),
        "expected at least one warning with tiny cache, but none were produced"
    );

    // existing assertions comparing warnings and results under different cache sizes
    assert_eq!(warnings_large.len(), warnings_tiny.len());
    assert_eq!(result_large.total, result_tiny.total);

```

If your warning type carries structured data (e.g. a `kind` or `code` field for cycle warnings), you can further strengthen this test by:
1. Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
   `assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle));`
2. Optionally asserting that the warning *contents* match between the large and tiny cache runs (e.g. by comparing IDs or messages).
These additional checks should be implemented using the actual fields and enums available on your warning type.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +23 to +32
async fn test_cache_eviction_cross_sbom_cycle_detection(
ctx: &TrustifyContext,
) -> Result<(), anyhow::Error> {
ctx.ingest_documents([
"cyclonedx/loop-external/a.json",
"cyclonedx/loop-external/b.json",
])
.await?;

let options = QueryOptions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen the cycle-detection test by asserting the actual warning content and presence of at least one cycle warning, not just equal counts.

Currently this only asserts that warnings_large.len() == warnings_tiny.len() and result_large.total == result_tiny.total, which checks consistency under cache pressure but not that cycles are actually detected.

To better verify the intended behaviour, please:

  • Assert that warnings_large (and/or warnings_tiny) is non-empty so at least one cycle is detected.
  • Optionally assert that at least one warning includes a cycle-specific marker (e.g. the phrase your cycle warning uses).
  • Optionally compare the warning messages themselves (not just their counts) between the large and tiny cache cases.

This will tie the test more directly to cross-SBOM cycle detection rather than just matching counts.

Suggested implementation:

    // ensure we actually detected at least one cycle warning in both cache scenarios
    assert!(
        !warnings_large.is_empty(),
        "expected at least one warning with large cache, but none were produced"
    );
    assert!(
        !warnings_tiny.is_empty(),
        "expected at least one warning with tiny cache, but none were produced"
    );

    // existing assertions comparing warnings and results under different cache sizes
    assert_eq!(warnings_large.len(), warnings_tiny.len());
    assert_eq!(result_large.total, result_tiny.total);

If your warning type carries structured data (e.g. a kind or code field for cycle warnings), you can further strengthen this test by:

  1. Adding a predicate that checks for at least one warning tagged as a cycle warning in each collection, e.g.:
    assert!(warnings_large.iter().any(|w| w.kind == WarningKind::Cycle));
  2. Optionally asserting that the warning contents match between the large and tiny cache runs (e.g. by comparing IDs or messages).
    These additional checks should be implemented using the actual fields and enums available on your warning type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants