test: collapse per-variant test clusters in ib-fabric, admin-cli, and libmlx by chet · Pull Request #2710 · NVIDIA/infra-controller

chet · 2026-06-20T15:36:47Z

Wave 1 of the carbide-test-support integration (#2692).

Three already-migrated crates still had one-function-per-variant test clusters and a few tests that asserted nothing. ib-fabric's 12 should_track_port_* + 3 ib_port_down_alert_* functions, admin-cli's 6 dpu_mode parse tests + the boot_override get/clear pair, and libmlx's four &[...]-vs-[...] trait-impl pairs all collapse into tables (each libmlx row monomorphizes its own container impl).

Drops tests that exercised nothing but the stdlib or a trivial setter (str::contains, the Command Debug string, construction-then-drop tests, two empty assert_dispatch::<T>() compile-checks, the no-assertion CLI routing tests). Rewrites admin-cli's health_override_templates so its variant-to-name map is exhaustive -- a newly-added template now fails to compile without a row, closing the silent _ => unreachable!() gap.

~290 fewer test lines, no production change. Verified per crate (cargo test: ib-fabric 21, admin-cli 346, libmlx 135 + 23), clippy-clean with -D warnings.

Addresses #2696.

chet · 2026-06-20T15:36:49Z

@coderabbitai PTAL, thanks!

copy-pr-bot · 2026-06-20T15:36:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-20T15:36:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6f900bed-257e-419f-b0cd-6a88b2a795dd

📥 Commits

Reviewing files that changed from the base of the PR and between 99c89bb and 8286ba5.

📒 Files selected for processing (13)

crates/admin-cli/src/boot_override/tests.rs
crates/admin-cli/src/cfg/dispatch.rs
crates/admin-cli/src/dpu/tests.rs
crates/admin-cli/src/expected_machines/tests.rs
crates/admin-cli/src/firmware/tests.rs
crates/admin-cli/src/machine/tests.rs
crates/ib-fabric/src/lib.rs
crates/libmlx/tests/lockdown/test_lockdown.rs
crates/libmlx/tests/lockdown/test_runner.rs
crates/libmlx/tests/runner/command_builder_tests.rs
crates/libmlx/tests/runner/exec_options_tests.rs
crates/libmlx/tests/runner/runner_integration_tests.rs
crates/libmlx/tests/runner/traits_tests.rs

💤 Files with no reviewable changes (6)

crates/admin-cli/src/dpu/tests.rs
crates/libmlx/tests/runner/command_builder_tests.rs
crates/libmlx/tests/runner/exec_options_tests.rs
crates/libmlx/tests/runner/runner_integration_tests.rs
crates/libmlx/tests/lockdown/test_lockdown.rs
crates/libmlx/tests/lockdown/test_runner.rs

✅ Files skipped from review due to trivial changes (1)

crates/admin-cli/src/boot_override/tests.rs

🚧 Files skipped from review as they are similar to previous changes (6)

crates/admin-cli/src/cfg/dispatch.rs
crates/admin-cli/src/expected_machines/tests.rs
crates/ib-fabric/src/lib.rs
crates/admin-cli/src/machine/tests.rs
crates/libmlx/tests/runner/traits_tests.rs
crates/admin-cli/src/firmware/tests.rs

Summary by CodeRabbit

Tests
- Refactored multiple suites to use scenario-driven/parameterized tables for more consistent coverage.
- Consolidated several CLI argument-parsing checks (including routing and --dpu-mode) into fewer scenario-based tests.
- Strengthened health template variant round-trip/validation coverage and added a basic clap command-structure integrity check.
- Removed multiple obsolete CLI parsing, runner/lockdown, and exec option/integration tests to streamline the test infrastructure.

Walkthrough

Test suites across admin-cli, ib-fabric, and libmlx are refactored by consolidating standalone per-variant test functions into table-driven scenarios! and value_scenarios! tests, removing construction-only and stdlib-only tests, expanding exhaustive match coverage in HealthReportTemplates, and replacing runtime Dispatch verification with compile-time validation.

Changes

Test Suite Consolidation into Scenario-Driven Tables

Layer / File(s)	Summary
admin-cli CLI parsing test consolidation `crates/admin-cli/src/boot_override/tests.rs`, `crates/admin-cli/src/dpu/tests.rs`, `crates/admin-cli/src/expected_machines/tests.rs`	`parse_get` and `parse_clear` merged into `parse_get_and_clear` scenario table routed over both variants; `parse_status` test removed; six per-variant `dpu-mode` tests (`Add` and `Patch` × three modes) consolidated into `parse_dpu_mode_to_its_variant` scenario table extracting and asserting `args.dpu_mode` across both command variants.
admin-cli compile-time and command validation `crates/admin-cli/src/cfg/dispatch.rs`, `crates/admin-cli/src/firmware/tests.rs`	Runtime `Dispatch` derive assertions (`assert_dispatch` helper and two `#[test]` functions) replaced with compile-time validation documentation; `CommandFactory` import isolated and `verify_cmd_structure` test added executing `Cmd::command().debug_assert()` for clap command-structure baseline validation.
admin-cli machine HealthReportTemplates exhaustive coverage `crates/admin-cli/src/machine/tests.rs`	`template_name` helper rewritten with fully exhaustive match over all `HealthReportTemplates` variants eliminating wildcard and "documented variants only" patterns; `scenarios!` table expanded to validate additional template variants including `request-repair` using dynamically computed `template_name` results instead of hard-coded string literals.
ib-fabric HealthProbeAlert and port-tracking scenario consolidation `crates/ib-fabric/src/lib.rs`	Standalone `ib_port_down` alert tests replaced with `PortDownAlertView` projection validating alert metadata (id, count phrase, down GUIDs, prevent_allocations, tenant message) via `value_scenarios!` across single and multiple down ports; multiple `should_track_port_as_down` functions replaced with single `TrackPortDown` struct-driven `value_scenarios!` test covering SKU/instance precedence, case-insensitive GUID matching, and fail-open behaviour for unknown GUIDs.
libmlx construction and no-assertion test removal `crates/libmlx/tests/lockdown/test_lockdown.rs`, `crates/libmlx/tests/lockdown/test_runner.rs`, `crates/libmlx/tests/runner/command_builder_tests.rs`, `crates/libmlx/tests/runner/exec_options_tests.rs`, `crates/libmlx/tests/runner/runner_integration_tests.rs`	Removed: `LockdownManager` with_runner and default construction tests; `FlintRunner::with_path` and `test_runner_default` plus entire `output_parsing_tests` module; `test_verbose_logging` and `test_command_spec_to_command_conversion`; `test_sample1_config` through `test_sample5_config` sample configurations; two `set_temp_file_prefix` integration tests verifying construction without assertion.
libmlx traits_tests scenario-driven refactor `crates/libmlx/tests/runner/traits_tests.rs`	`value_scenarios` import added; `to_config_values` string-tuple tests consolidated into single scenario test using generic sorted `(name, value)` projection helper validating owned and borrowed array inputs; indexed-assignment tests collapsed preserving sparse array value coverage; `to_variable_names` scalar and array-expansion tests unified into `value_scenarios!`-driven tests sorting results and covering owned vs. borrowed container forms with consolidated array-expansion name list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

carbide-test-support integration -- finish the migration, extend the coverage #2692: The PR directly implements the Wave 1 test cleanup described in this issue—collapsing per-variant test clusters into scenarios! and value_scenarios! tables across admin-cli, ib-fabric, and libmlx crates.
Collapse per-variant test clusters in admin-cli, libmlx, and ib-fabric #2696: The scope of this PR—merging boot_override parse tests, consolidating dpu_mode variants, rewriting health_override_templates exhaustive coverage, consolidating ib-fabric port-tracking tests, and refactoring libmlx traits tests—precisely matches the refactoring objectives outlined in this issue.

Possibly related PRs

NVIDIA/infra-controller#2570: Both PRs modify crates/admin-cli/src/boot_override/tests.rs, converting parse_get and parse_clear tests into grouped, table-driven scenario forms.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the primary change: consolidating per-variant test clusters across three crates (ib-fabric, admin-cli, and libmlx) into table-driven structures.
Description check	✅ Passed	The description comprehensively documents the refactoring objectives, implementation details across each crate, test coverage validation, and production neutrality. It directly addresses the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-06-20T15:38:30Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get PR details
gh pr view 2710 --repo NVIDIA/infra-controller --json title,body,files,state,additions,deletions

echo "---FILES---"
gh pr diff 2710 --repo NVIDIA/infra-controller

Length of output: 67935

@chet: Solid cleanup — the table-driven consolidation is a clear win across all three crates, the comments explaining each structural decision are unusually thorough, and closing the _ => unreachable!() gap in health_override_templates_value_enum with a fully exhaustive match is exactly the right move. A few observations:

crates/ib-fabric/src/lib.rs — duplicate rows in should_track_port_as_down_follows_sku_then_instance_precedence

The last two rows exercise an identical input set:

"neither SKU nor instance -> do not track" {
    TrackPortDown { guid: "guid1", guid_to_index: &[("guid1", 0)], expected_inactive: None, instance_guids: None }
} => false,

"SKU has no IB devices (None), no instance -> do not track" {
    TrackPortDown { guid: "guid1", guid_to_index: &[("guid1", 0)], expected_inactive: None, instance_guids: None }
} => false,

Both have expected_inactive: None, instance_guids: None, and identical GUID/index inputs. The old test names were semantically distinct (_no_sku_no_instance vs _sku_no_ib_devices_no_instance), but since both collapse to (None, None) in the function signature, they hit the same branch. One of the rows can be removed without any loss of coverage.

crates/libmlx/tests/runner/traits_tests.rs — variable_names_expand_full_arrays_to_indices is owned-only

variable_names_from_scalars_across_container_forms correctly exercises the &[T; N] impl of MlxConfigQueryable for scalar names. However, variable_names_expand_full_arrays_to_indices only has the owned [...] row — the &[...] coverage for the array-expansion path was removed with test_mlx_config_queryable_array_references. The &[T; N] impl for array variable expansion is no longer explicitly exercised. Adding a second row with names(&["GPIO_ENABLED", "THERMAL_SENSORS"], &registry) would close that gap with minimal overhead (same expected() closure, same shape as the existing config_values tables).

Everything else is well-executed. The removal of construction-then-drop, Debug-string, str::contains, and no-assertion routing tests is entirely justified, and the #[allow(clippy::needless_borrows_for_generic_args)] annotations with their load-bearing explanations are a nice touch.

chet · 2026-06-20T15:54:49Z

@coderabbitai Thanks for the eyes. Updated -- dropped the duplicate (None, None) row and folded the "SKU with no IB devices" case into the kept row's comment, since both reach should_track_port_as_down as expected_inactive: None. PTAL. Resolve if good, thanks!

coderabbitai · 2026-06-20T15:56:17Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the latest diff to verify the duplicate row fix
gh pr diff 2710 --repo NVIDIA/infra-controller | grep -A 60 "should_track_port_as_down_follows_sku_then_instance_precedence"

Length of output: 2555

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the full current file content for the relevant test function
gh pr diff 2710 --repo NVIDIA/infra-controller | grep -A 120 "should_track_port_as_down_follows_sku_then_instance_precedence" | tail -80

Length of output: 3293

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look at the current file content around the test function to see the final state
gh api repos/NVIDIA/infra-controller/contents/crates/ib-fabric/src/lib.rs \
  --jq '.content' | base64 -d | grep -A 80 "should_track_port_as_down_follows_sku_then_instance_precedence" | grep -v "^--$" | head -90

Length of output: 568

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the PR branch name and fetch file from it
branch=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefName --jq '.headRefName')
echo "Branch: $branch"

gh api "repos/NVIDIA/infra-controller/contents/crates/ib-fabric/src/lib.rs?ref=$branch" \
  --jq '.content' | base64 -d | grep -A 90 "should_track_port_as_down_follows_sku_then_instance_precedence" | head -95

Length of output: 751

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the PR head SHA
head_sha=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefOid --jq '.headRefOid')
echo "Head SHA: $head_sha"

# Fetch via tree API to find the file
gh api "repos/NVIDIA/infra-controller/git/trees/$head_sha?recursive=1" \
  --jq '.tree[] | select(.path | contains("ib-fabric")) | .path' 2>/dev/null | head -20

Length of output: 963

🏁 Script executed:

#!/bin/bash
set -euo pipefail

head_sha=$(gh pr view 2710 --repo NVIDIA/infra-controller --json headRefOid --jq '.headRefOid')

# Get the blob SHA for the file
blob_sha=$(gh api "repos/NVIDIA/infra-controller/git/trees/$head_sha?recursive=1" \
  --jq '.tree[] | select(.path == "crates/ib-fabric/src/lib.rs") | .sha')
echo "Blob SHA: $blob_sha"

# Fetch the blob content
gh api "repos/NVIDIA/infra-controller/git/blobs/$blob_sha" \
  --jq '.content' | base64 -d | grep -n "neither SKU\|no IB devices\|None, None\|instance_guids: None\|expected_inactive: None" | head -20

Length of output: 1669

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List coderabbit's review comments to find the one about duplicate rows
gh api "repos/NVIDIA/infra-controller/pulls/2710/comments" \
  --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: (.body | .[0:120])}' | head -60

Length of output: 514

@chet: The ib-fabric fix is clean — the comment at lines 1608–1609 accurately captures both semantic cases (expected_inactive: None from no-SKU and from a SKU with no IB devices), and the table now has exactly one row for that branch. No concerns there.

The second observation from the prior review — variable_names_expand_full_arrays_to_indices in crates/libmlx/tests/runner/traits_tests.rs covering only the owned [...] form and leaving the &[T; N] array-expansion path unexercised — is still open. Would you like to add that &[...] row in this PR, or defer it? Happy to resolve once that's either addressed or explicitly deferred.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/admin-cli/src/cfg/dispatch.rs (1)
68-77: ⚡ Quick win

Remove #[allow(dead_code)] by adding explicit compile-time usage of derived enums.

Now that runtime scaffolding is removed, these #[allow(dead_code)] attributes can be replaced with a tiny type-assertion test (e.g., fn assert_dispatch<T: Dispatch>() {} and calls for both enums), preserving intent without suppressing lints.

As per coding guidelines, “Strongly avoid using #[allow(dead_code)] for Rust code… Dead code detection is important to catch mistakes and avoid unused code building up.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/cfg/dispatch.rs` around lines 68 - 77, Remove the
`#[allow(dead_code)]` attributes from both the AllRunCmd and MixedCmd enum
definitions. Instead, add a small generic assertion function at the bottom of
the test module (e.g., a function named assert_dispatch that takes a generic
type parameter constrained to the Dispatch trait with an empty body). Then call
this assertion function once for AllRunCmd and once for MixedCmd to provide
compile-time proof that these enums are being used, eliminating the need for the
dead code suppression lint.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/libmlx/tests/runner/traits_tests.rs`:
- Around line 350-365: The value_scenarios! table-driven test currently only
covers the owned array form in the call to names() with ["GPIO_ENABLED",
"THERMAL_SENSORS"], but is missing coverage for the borrowed form with
&["GPIO_ENABLED", "THERMAL_SENSORS"]. Add a second test row to the table that
calls names() with the borrowed form &["GPIO_ENABLED", "THERMAL_SENSORS"] and
the same registry parameter, using the identical expected expansion list with
all the indexed array elements. This ensures both the owned and borrowed
container forms are tested with the same expansion operation in the table.

---

Nitpick comments:
In `@crates/admin-cli/src/cfg/dispatch.rs`:
- Around line 68-77: Remove the `#[allow(dead_code)]` attributes from both the
AllRunCmd and MixedCmd enum definitions. Instead, add a small generic assertion
function at the bottom of the test module (e.g., a function named
assert_dispatch that takes a generic type parameter constrained to the Dispatch
trait with an empty body). Then call this assertion function once for AllRunCmd
and once for MixedCmd to provide compile-time proof that these enums are being
used, eliminating the need for the dead code suppression lint.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42b76065-2345-4577-acf8-1118d4a6850b

📥 Commits

Reviewing files that changed from the base of the PR and between 79e54c0 and 377bf66.

📒 Files selected for processing (13)

crates/admin-cli/src/boot_override/tests.rs
crates/admin-cli/src/cfg/dispatch.rs
crates/admin-cli/src/dpu/tests.rs
crates/admin-cli/src/expected_machines/tests.rs
crates/admin-cli/src/firmware/tests.rs
crates/admin-cli/src/machine/tests.rs
crates/ib-fabric/src/lib.rs
crates/libmlx/tests/lockdown/test_lockdown.rs
crates/libmlx/tests/lockdown/test_runner.rs
crates/libmlx/tests/runner/command_builder_tests.rs
crates/libmlx/tests/runner/exec_options_tests.rs
crates/libmlx/tests/runner/runner_integration_tests.rs
crates/libmlx/tests/runner/traits_tests.rs

💤 Files with no reviewable changes (6)

crates/libmlx/tests/runner/exec_options_tests.rs
crates/admin-cli/src/dpu/tests.rs
crates/libmlx/tests/lockdown/test_lockdown.rs
crates/libmlx/tests/lockdown/test_runner.rs
crates/libmlx/tests/runner/command_builder_tests.rs
crates/libmlx/tests/runner/runner_integration_tests.rs

github-actions · 2026-06-21T05:38:26Z

🔍 Container Scan Summary

Service	Total	Critical	High	Medium	Low	Other
boot-artifacts-aarch64	3	0	0	3	0	0
boot-artifacts-x86_64	3	0	0	3	0	0
forge-admin-cli-x86_64	264	6	23	99	6	130
machine-validation-runner	704	34	183	258	35	194
machine_validation	704	34	183	258	35	194
nvmetal-carbide	704	34	183	258	35	194
TOTAL	2382	108	572	879	111	712

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

… libmlx Three already-migrated crates still carried one-function-per-variant clusters and a handful of tests that asserted nothing. `ib-fabric`'s 12 `should_track_port_*` functions and 3 `ib_port_down_alert_*` functions become two tables; `admin-cli`'s six `dpu_mode` parse tests and the `boot_override` get/clear pair collapse into tables; `libmlx`'s four `&[...]`-vs-`[...]` trait-impl pairs become four small tables (each row monomorphizes its own container impl). The same pass drops tests that exercised nothing but the stdlib or a trivial setter -- `str::contains`, the `Command` `Debug` string, construction-then-drop "creation"/"default" tests, two empty `assert_dispatch::<T>()` compile-checks, and the no-assertion CLI routing tests. It also rewrites `admin-cli`'s `health_override_templates` check so the variant-to-name map is exhaustive: a newly-added template now fails to compile without a row, closing the silent `_ => unreachable!()` gap the old test had. About 290 fewer test lines, no production change. Verified per crate (ib-fabric 21, admin-cli 346, libmlx 135 + 23) and clippy-clean with `-D warnings`. Signed-off-by: Chet Nichols III <chetn@nvidia.com>

chet force-pushed the gh-issue-2696 branch from 83034aa to 377bf66 Compare June 20, 2026 15:54

rwthompsonii approved these changes Jun 20, 2026

View reviewed changes

chet marked this pull request as ready for review June 21, 2026 04:15

chet requested a review from a team as a code owner June 21, 2026 04:15

rwthompsonii approved these changes Jun 21, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Comment thread crates/libmlx/tests/runner/traits_tests.rs

chet force-pushed the gh-issue-2696 branch from 377bf66 to 99c89bb Compare June 21, 2026 05:07

chet force-pushed the gh-issue-2696 branch from 99c89bb to 8286ba5 Compare June 21, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: collapse per-variant test clusters in ib-fabric, admin-cli, and libmlx#2710

test: collapse per-variant test clusters in ib-fabric, admin-cli, and libmlx#2710
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2696

chet commented Jun 20, 2026

Uh oh!

chet commented Jun 20, 2026

Uh oh!

copy-pr-bot Bot commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026

Uh oh!

chet commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chet commented Jun 20, 2026

Uh oh!

chet commented Jun 20, 2026

Uh oh!

copy-pr-bot Bot commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

coderabbitai Bot commented Jun 20, 2026

Uh oh!

chet commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Container Scan Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 21, 2026 •

edited

Loading