Skip to content

test: collapse per-variant test clusters in ib-fabric, admin-cli, and libmlx#2710

Open
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2696
Open

test: collapse per-variant test clusters in ib-fabric, admin-cli, and libmlx#2710
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2696

Conversation

@chet

@chet chet commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Wave 1 of the carbide-test-support integration (#2692).

Three already-migrated crates still had one-function-per-variant test clusters and a few tests that asserted nothing. ib-fabric's 12 should_track_port_* + 3 ib_port_down_alert_* functions, admin-cli's 6 dpu_mode parse tests + the boot_override get/clear pair, and libmlx's four &[...]-vs-[...] trait-impl pairs all collapse into tables (each libmlx row monomorphizes its own container impl).

Drops tests that exercised nothing but the stdlib or a trivial setter (str::contains, the Command Debug string, construction-then-drop tests, two empty assert_dispatch::<T>() compile-checks, the no-assertion CLI routing tests). Rewrites admin-cli's health_override_templates so its variant-to-name map is exhaustive -- a newly-added template now fails to compile without a row, closing the silent _ => unreachable!() gap.

~290 fewer test lines, no production change. Verified per crate (cargo test: ib-fabric 21, admin-cli 346, libmlx 135 + 23), clippy-clean with -D warnings.

Addresses #2696.

@chet

chet commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai PTAL, thanks!

@copy-pr-bot

copy-pr-bot Bot commented Jun 20, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6f900bed-257e-419f-b0cd-6a88b2a795dd

📥 Commits

Reviewing files that changed from the base of the PR and between 99c89bb and 8286ba5.

📒 Files selected for processing (13)
  • crates/admin-cli/src/boot_override/tests.rs
  • crates/admin-cli/src/cfg/dispatch.rs
  • crates/admin-cli/src/dpu/tests.rs
  • crates/admin-cli/src/expected_machines/tests.rs
  • crates/admin-cli/src/firmware/tests.rs
  • crates/admin-cli/src/machine/tests.rs
  • crates/ib-fabric/src/lib.rs
  • crates/libmlx/tests/lockdown/test_lockdown.rs
  • crates/libmlx/tests/lockdown/test_runner.rs
  • crates/libmlx/tests/runner/command_builder_tests.rs
  • crates/libmlx/tests/runner/exec_options_tests.rs
  • crates/libmlx/tests/runner/runner_integration_tests.rs
  • crates/libmlx/tests/runner/traits_tests.rs
💤 Files with no reviewable changes (6)
  • crates/admin-cli/src/dpu/tests.rs
  • crates/libmlx/tests/runner/command_builder_tests.rs
  • crates/libmlx/tests/runner/exec_options_tests.rs
  • crates/libmlx/tests/runner/runner_integration_tests.rs
  • crates/libmlx/tests/lockdown/test_lockdown.rs
  • crates/libmlx/tests/lockdown/test_runner.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/admin-cli/src/boot_override/tests.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • crates/admin-cli/src/cfg/dispatch.rs
  • crates/admin-cli/src/expected_machines/tests.rs
  • crates/ib-fabric/src/lib.rs
  • crates/admin-cli/src/machine/tests.rs
  • crates/libmlx/tests/runner/traits_tests.rs
  • crates/admin-cli/src/firmware/tests.rs

Summary by CodeRabbit

  • Tests
    • Refactored multiple suites to use scenario-driven/parameterized tables for more consistent coverage.
    • Consolidated several CLI argument-parsing checks (including routing and --dpu-mode) into fewer scenario-based tests.
    • Strengthened health template variant round-trip/validation coverage and added a basic clap command-structure integrity check.
    • Removed multiple obsolete CLI parsing, runner/lockdown, and exec option/integration tests to streamline the test infrastructure.

Walkthrough

Test suites across admin-cli, ib-fabric, and libmlx are refactored by consolidating standalone per-variant test functions into table-driven scenarios! and value_scenarios! tests, removing construction-only and stdlib-only tests, expanding exhaustive match coverage in HealthReportTemplates, and replacing runtime Dispatch verification with compile-time validation.

Changes

Test Suite Consolidation into Scenario-Driven Tables

Layer / File(s) Summary
admin-cli CLI parsing test consolidation
crates/admin-cli/src/boot_override/tests.rs, crates/admin-cli/src/dpu/tests.rs, crates/admin-cli/src/expected_machines/tests.rs
parse_get and parse_clear merged into parse_get_and_clear scenario table routed over both variants; parse_status test removed; six per-variant dpu-mode tests (Add and Patch × three modes) consolidated into parse_dpu_mode_to_its_variant scenario table extracting and asserting args.dpu_mode across both command variants.
admin-cli compile-time and command validation
crates/admin-cli/src/cfg/dispatch.rs, crates/admin-cli/src/firmware/tests.rs
Runtime Dispatch derive assertions (assert_dispatch helper and two #[test] functions) replaced with compile-time validation documentation; CommandFactory import isolated and verify_cmd_structure test added executing Cmd::command().debug_assert() for clap command-structure baseline validation.
admin-cli machine HealthReportTemplates exhaustive coverage
crates/admin-cli/src/machine/tests.rs
template_name helper rewritten with fully exhaustive match over all HealthReportTemplates variants eliminating wildcard and "documented variants only" patterns; scenarios! table expanded to validate additional template variants including request-repair using dynamically computed template_name results instead of hard-coded string literals.
ib-fabric HealthProbeAlert and port-tracking scenario consolidation
crates/ib-fabric/src/lib.rs
Standalone ib_port_down alert tests replaced with PortDownAlertView projection validating alert metadata (id, count phrase, down GUIDs, prevent_allocations, tenant message) via value_scenarios! across single and multiple down ports; multiple should_track_port_as_down functions replaced with single TrackPortDown struct-driven value_scenarios! test covering SKU/instance precedence, case-insensitive GUID matching, and fail-open behaviour for unknown GUIDs.
libmlx construction and no-assertion test removal
crates/libmlx/tests/lockdown/test_lockdown.rs, crates/libmlx/tests/lockdown/test_runner.rs, crates/libmlx/tests/runner/command_builder_tests.rs, crates/libmlx/tests/runner/exec_options_tests.rs, crates/libmlx/tests/runner/runner_integration_tests.rs
Removed: LockdownManager with_runner and default construction tests; FlintRunner::with_path and test_runner_default plus entire output_parsing_tests module; test_verbose_logging and test_command_spec_to_command_conversion; test_sample1_config through test_sample5_config sample configurations; two set_temp_file_prefix integration tests verifying construction without assertion.
libmlx traits_tests scenario-driven refactor
crates/libmlx/tests/runner/traits_tests.rs
value_scenarios import added; to_config_values string-tuple tests consolidated into single scenario test using generic sorted (name, value) projection helper validating owned and borrowed array inputs; indexed-assignment tests collapsed preserving sparse array value coverage; to_variable_names scalar and array-expansion tests unified into value_scenarios!-driven tests sorting results and covering owned vs. borrowed container forms with consolidated array-expansion name list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/infra-controller#2570: Both PRs modify crates/admin-cli/src/boot_override/tests.rs, converting parse_get and parse_clear tests into grouped, table-driven scenario forms.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the primary change: consolidating per-variant test clusters across three crates (ib-fabric, admin-cli, and libmlx) into table-driven structures.
Description check ✅ Passed The description comprehensively documents the refactoring objectives, implementation details across each crate, test coverage validation, and production neutrality. It directly addresses the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get PR details
gh pr view 2710 --repo NVIDIA/infra-controller --json title,body,files,state,additions,deletions

echo "---FILES---"
gh pr diff 2710 --repo NVIDIA/infra-controller

Length of output: 67935



@chet: Solid cleanup — the table-driven consolidation is a clear win across all three crates, the comments explaining each structural decision are unusually thorough, and closing the _ => unreachable!() gap in health_override_templates_value_enum with a fully exhaustive match is exactly the right move. A few observations:


crates/ib-fabric/src/lib.rs — duplicate rows in should_track_port_as_down_follows_sku_then_instance_precedence

The last two rows exercise an identical input set:

"neither SKU nor instance -> do not track" {
    TrackPortDown { guid: "guid1", guid_to_index: &[("guid1", 0)], expected_inactive: None, instance_guids: None }
} => false,

"SKU has no IB devices (None), no instance -> do not track" {
    TrackPortDown { guid: "guid1", guid_to_index: &[("guid1", 0)], expected_inactive: None, instance_guids: None }
} => false,

Both have expected_inactive: None, instance_guids: None, and identical GUID/index inputs. The old test names were semantically distinct (_no_sku_no_instance vs _sku_no_ib_devices_no_instance), but since both collapse to (None, None) in the function signature, they hit the same branch. One of the rows can be removed without any loss of coverage.


crates/libmlx/tests/runner/traits_tests.rsvariable_names_expand_full_arrays_to_indices is owned-only

variable_names_from_scalars_across_container_forms correctly exercises the &[T; N] impl of MlxConfigQueryable for scalar names. However, variable_names_expand_full_arrays_to_indices only has the owned [...] row — the &[...] coverage for the array-expansion path was removed with test_mlx_config_queryable_array_references. The &[T; N] impl for array variable expansion is no longer explicitly exercised. Adding a second row with names(&["GPIO_ENABLED", "THERMAL_SENSORS"], &registry) would close that gap with minimal overhead (same expected() closure, same shape as the existing config_values tables).


Everything else is well-executed. The removal of construction-then-drop, Debug-string, str::contains, and no-assertion routing tests is entirely justified, and the #[allow(clippy::needless_borrows_for_generic_args)] annotations with their load-bearing explanations are a nice touch.

@chet

chet commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai Thanks for the eyes. Updated -- dropped the duplicate (None, None) row and folded the "SKU with no IB devices" case into the kept row's comment, since both reach should_track_port_as_down as expected_inactive: None. PTAL. Resolve if good, thanks!

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the latest diff to verify the duplicate row fix
gh pr diff 2710 --repo NVIDIA/infra-controller | grep -A 60 "should_track_port_as_down_follows_sku_then_instance_precedence"

Length of output: 2555


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the full current file content for the relevant test function
gh pr diff 2710 --repo NVIDIA/infra-controller | grep -A 120 "should_track_port_as_down_follows_sku_then_instance_precedence" | tail -80

Length of output: 3293


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look at the current file content around the test function to see the final state
gh api repos/NVIDIA/infra-controller/contents/crates/ib-fabric/src/lib.rs \
  --jq '.content' | base64 -d | grep -A 80 "should_track_port_as_down_follows_sku_then_instance_precedence" | grep -v "^--$" | head -90

Length of output: 568


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the PR branch name and fetch file from it
branch=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefName --jq '.headRefName')
echo "Branch: $branch"

gh api "repos/NVIDIA/infra-controller/contents/crates/ib-fabric/src/lib.rs?ref=$branch" \
  --jq '.content' | base64 -d | grep -A 90 "should_track_port_as_down_follows_sku_then_instance_precedence" | head -95

Length of output: 751


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the PR head SHA
head_sha=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefOid --jq '.headRefOid')
echo "Head SHA: $head_sha"

# Fetch via tree API to find the file
gh api "repos/NVIDIA/infra-controller/git/trees/$head_sha?recursive=1" \
  --jq '.tree[] | select(.path | contains("ib-fabric")) | .path' 2>/dev/null | head -20

Length of output: 963


🏁 Script executed:

#!/bin/bash
set -euo pipefail

head_sha=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefOid --jq '.headRefOid')

# Get the blob SHA for the file
blob_sha=$(gh api "repos/NVIDIA/infra-controller/git/trees/$head_sha?recursive=1" \
  --jq '.tree[] | select(.path == "crates/ib-fabric/src/lib.rs") | .sha')
echo "Blob SHA: $blob_sha"

# Fetch the blob content
gh api "repos/NVIDIA/infra-controller/git/blobs/$blob_sha" \
  --jq '.content' | base64 -d | grep -n "neither SKU\|no IB devices\|None, None\|instance_guids: None\|expected_inactive: None" | head -20

Length of output: 1669


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List coderabbit's review comments to find the one about duplicate rows
gh api "repos/NVIDIA/infra-controller/pulls/2710/comments" \
  --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: (.body | .[0:120])}' | head -60

Length of output: 514



@chet: The ib-fabric fix is clean — the comment at lines 1608–1609 accurately captures both semantic cases (expected_inactive: None from no-SKU and from a SKU with no IB devices), and the table now has exactly one row for that branch. No concerns there.

The second observation from the prior review — variable_names_expand_full_arrays_to_indices in crates/libmlx/tests/runner/traits_tests.rs covering only the owned [...] form and leaving the &[T; N] array-expansion path unexercised — is still open. Would you like to add that &[...] row in this PR, or defer it? Happy to resolve once that's either addressed or explicitly deferred.

@chet chet marked this pull request as ready for review June 21, 2026 04:15
@chet chet requested a review from a team as a code owner June 21, 2026 04:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/admin-cli/src/cfg/dispatch.rs (1)

68-77: ⚡ Quick win

Remove #[allow(dead_code)] by adding explicit compile-time usage of derived enums.

Now that runtime scaffolding is removed, these #[allow(dead_code)] attributes can be replaced with a tiny type-assertion test (e.g., fn assert_dispatch<T: Dispatch>() {} and calls for both enums), preserving intent without suppressing lints.

As per coding guidelines, “Strongly avoid using #[allow(dead_code)] for Rust code… Dead code detection is important to catch mistakes and avoid unused code building up.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/cfg/dispatch.rs` around lines 68 - 77, Remove the
`#[allow(dead_code)]` attributes from both the AllRunCmd and MixedCmd enum
definitions. Instead, add a small generic assertion function at the bottom of
the test module (e.g., a function named assert_dispatch that takes a generic
type parameter constrained to the Dispatch trait with an empty body). Then call
this assertion function once for AllRunCmd and once for MixedCmd to provide
compile-time proof that these enums are being used, eliminating the need for the
dead code suppression lint.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/libmlx/tests/runner/traits_tests.rs`:
- Around line 350-365: The value_scenarios! table-driven test currently only
covers the owned array form in the call to names() with ["GPIO_ENABLED",
"THERMAL_SENSORS"], but is missing coverage for the borrowed form with
&["GPIO_ENABLED", "THERMAL_SENSORS"]. Add a second test row to the table that
calls names() with the borrowed form &["GPIO_ENABLED", "THERMAL_SENSORS"] and
the same registry parameter, using the identical expected expansion list with
all the indexed array elements. This ensures both the owned and borrowed
container forms are tested with the same expansion operation in the table.

---

Nitpick comments:
In `@crates/admin-cli/src/cfg/dispatch.rs`:
- Around line 68-77: Remove the `#[allow(dead_code)]` attributes from both the
AllRunCmd and MixedCmd enum definitions. Instead, add a small generic assertion
function at the bottom of the test module (e.g., a function named
assert_dispatch that takes a generic type parameter constrained to the Dispatch
trait with an empty body). Then call this assertion function once for AllRunCmd
and once for MixedCmd to provide compile-time proof that these enums are being
used, eliminating the need for the dead code suppression lint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42b76065-2345-4577-acf8-1118d4a6850b

📥 Commits

Reviewing files that changed from the base of the PR and between 79e54c0 and 377bf66.

📒 Files selected for processing (13)
  • crates/admin-cli/src/boot_override/tests.rs
  • crates/admin-cli/src/cfg/dispatch.rs
  • crates/admin-cli/src/dpu/tests.rs
  • crates/admin-cli/src/expected_machines/tests.rs
  • crates/admin-cli/src/firmware/tests.rs
  • crates/admin-cli/src/machine/tests.rs
  • crates/ib-fabric/src/lib.rs
  • crates/libmlx/tests/lockdown/test_lockdown.rs
  • crates/libmlx/tests/lockdown/test_runner.rs
  • crates/libmlx/tests/runner/command_builder_tests.rs
  • crates/libmlx/tests/runner/exec_options_tests.rs
  • crates/libmlx/tests/runner/runner_integration_tests.rs
  • crates/libmlx/tests/runner/traits_tests.rs
💤 Files with no reviewable changes (6)
  • crates/libmlx/tests/runner/exec_options_tests.rs
  • crates/admin-cli/src/dpu/tests.rs
  • crates/libmlx/tests/lockdown/test_lockdown.rs
  • crates/libmlx/tests/lockdown/test_runner.rs
  • crates/libmlx/tests/runner/command_builder_tests.rs
  • crates/libmlx/tests/runner/runner_integration_tests.rs

Comment thread crates/libmlx/tests/runner/traits_tests.rs
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 264 6 23 99 6 130
machine-validation-runner 704 34 183 258 35 194
machine_validation 704 34 183 258 35 194
nvmetal-carbide 704 34 183 258 35 194
TOTAL 2382 108 572 879 111 712

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

… libmlx

Three already-migrated crates still carried one-function-per-variant clusters and
a handful of tests that asserted nothing. `ib-fabric`'s 12 `should_track_port_*`
functions and 3 `ib_port_down_alert_*` functions become two tables; `admin-cli`'s
six `dpu_mode` parse tests and the `boot_override` get/clear pair collapse into
tables; `libmlx`'s four `&[...]`-vs-`[...]` trait-impl pairs become four small
tables (each row monomorphizes its own container impl).

The same pass drops tests that exercised nothing but the stdlib or a trivial
setter -- `str::contains`, the `Command` `Debug` string, construction-then-drop
"creation"/"default" tests, two empty `assert_dispatch::<T>()` compile-checks,
and the no-assertion CLI routing tests. It also rewrites `admin-cli`'s
`health_override_templates` check so the variant-to-name map is exhaustive: a
newly-added template now fails to compile without a row, closing the silent
`_ => unreachable!()` gap the old test had.

About 290 fewer test lines, no production change. Verified per crate (ib-fabric
21, admin-cli 346, libmlx 135 + 23) and clippy-clean with `-D warnings`.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants