fix(scout): skip TPM2_Clear on hosts with no TPM device by s3rj1k · Pull Request #2703 · NVIDIA/infra-controller

s3rj1k · 2026-06-19T15:10:54Z

clear_tpm ran TPM2_Clear unconditionally during discovery deprovision (run_no_api -> clear_tpm), which hard-fails on a host with no TPM (/dev/tpmrm0 absent): the scout logs it non-fatally and re-polls, so discovery never reports DiscoverMachine and the machine loops in WaitingForDiscovery. register::run already tolerates no-TPM hosts via is_dpu (absent EK cert gates all attestation), so clear_tpm was the only unconditional TPM call blocking discovery. Map the TCTI to its device file and skip the clear when that device is absent; a present TPM that fails to clear stays a hard error.

Type of Change

Add - New feature or capability
Change - Changes in existing functionality
Fix - Bug fixes
Remove - Removed features or deprecated functionality
Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

This PR contains breaking changes

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No testing required (docs, internal refactor, etc.)

Additional Notes

clear_tpm ran TPM2_Clear unconditionally during discovery deprovision (run_no_api -> clear_tpm), which hard-fails on a host with no TPM (/dev/tpmrm0 absent): the scout logs it non-fatally and re-polls, so discovery never reports DiscoverMachine and the machine loops in WaitingForDiscovery. register::run already tolerates no-TPM hosts via is_dpu (absent EK cert gates all attestation), so clear_tpm was the only unconditional TPM call blocking discovery. Map the TCTI to its device file and skip the clear when that device is absent; a present TPM that fails to clear stays a hard error. Signed-off-by: s3rj1k <evasive.gyron@gmail.com>

copy-pr-bot · 2026-06-19T15:10:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-19T15:11:12Z

Summary by CodeRabbit

Bug Fixes
- Improved TPM clearing logic to gracefully handle systems without TPM device files. The operation now skips clearing on systems lacking a TPM device and logs a warning, preventing unnecessary errors.

Walkthrough

crates/scout/src/tpm.rs gains a new internal helper tpm_device_path that parses a device:/... TCTI string into a filesystem path, returning None for socket-based TCTIs. clear_tpm is updated to use this helper and skip the TPM2_Clear operation with an Ok(()) early-return when the device file is not present on the host.

Changes

TPM Device Existence Guard

Layer / File(s)	Summary
`tpm_device_path` helper and `clear_tpm` early-return guard `crates/scout/src/tpm.rs`	Adds `tpm_device_path` to extract a filesystem path from `device:` TCTI strings. Updates `clear_tpm` to check for the device file's existence; logs a warning and returns `Ok(())` immediately when the file is absent, bypassing `set_tpm_max_auth_fail` and `TPM2_Clear`. Existing execution path is preserved when the device is present or when the TCTI is socket-based.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the primary change—skipping TPM2_Clear on hosts without a TPM device—directly aligned with the changeset.
Description check	✅ Passed	The description is well-structured, clearly relating the problem (unconditional TPM2_Clear failure), the solution (conditional device check), and context (alignment with existing register::run behavior).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

crates/scout/src/tpm.rs (2)
74-74: ⚡ Quick win

Use structured tracing fields in the warning log.

Prefer key/value fields (e.g., device=%dev) instead of interpolation to keep logs logfmt-friendly and queryable.

As per coding guidelines, "All services should emit logs in 'logfmt' syntax" and "prefer placing common fields as attributes passed to tracing functions instead of using string interpolation."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/scout/src/tpm.rs` at line 74, The tracing::warn! call in the clear_tpm
function is using string interpolation to embed the dev variable into the log
message, which is not logfmt-friendly. Replace the string interpolation with
structured tracing fields by moving the dev variable outside the message string
and passing it as a key/value field argument to tracing::warn! using the format
device=%dev, keeping only descriptive text in the message string itself.
Source: Coding guidelines

60-65: ⚡ Quick win

Add table-driven tests for TCTI parsing and skip/clear branching.

tpm_device_path and the new early-return guard are input-mapping logic and should be covered with table-driven cases (device:/dev/tpmrm0, plain /dev/tpmrm0, socket TCTIs, malformed strings, absent/present device outcomes).

As per coding guidelines, "Prefer table-driven tests for any function that maps inputs to outputs, errors, or other observable results" and "Reach for a table whenever two or more tests call the same operation with different inputs."

Also applies to: 67-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/scout/src/tpm.rs` around lines 60 - 65, The tpm_device_path function
lacks comprehensive test coverage for its input-to-output mapping logic. Add
table-driven tests that cover multiple input scenarios including device TCTI
paths like device:/dev/tpmrm0, plain /dev/tpmrm0 paths, socket TCTIs that should
return None, malformed strings, and cases with absent/present device outcomes.
Apply the same table-driven testing approach to the early-return guard logic
(lines 67-76) as these are also input-mapping functions that benefit from
systematic case coverage per coding guidelines.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/scout/src/tpm.rs`:
- Around line 71-75: In the TPM device check within the function around line
71-75, replace the `Path::exists()` call with `try_exists()` to properly
distinguish between a missing device (Ok(false)) and actual IO/permission errors
(Err), then propagate any errors instead of silently returning Ok. Additionally,
refactor the tracing::warn! call to use structured logging conventions by moving
the device path from string interpolation {dev} into a tracing field using the
format device=%dev. Finally, add comprehensive table-driven test coverage for
the tpm_device_path() function to validate its input-output mapping behavior
according to the project's STYLE_GUIDE.md testing conventions.

---

Nitpick comments:
In `@crates/scout/src/tpm.rs`:
- Line 74: The tracing::warn! call in the clear_tpm function is using string
interpolation to embed the dev variable into the log message, which is not
logfmt-friendly. Replace the string interpolation with structured tracing fields
by moving the dev variable outside the message string and passing it as a
key/value field argument to tracing::warn! using the format device=%dev, keeping
only descriptive text in the message string itself.
- Around line 60-65: The tpm_device_path function lacks comprehensive test
coverage for its input-to-output mapping logic. Add table-driven tests that
cover multiple input scenarios including device TCTI paths like
device:/dev/tpmrm0, plain /dev/tpmrm0 paths, socket TCTIs that should return
None, malformed strings, and cases with absent/present device outcomes. Apply
the same table-driven testing approach to the early-return guard logic (lines
67-76) as these are also input-mapping functions that benefit from systematic
case coverage per coding guidelines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 15fc4475-46a0-41d5-b440-70605589fa0c

📥 Commits

Reviewing files that changed from the base of the PR and between d5e5b61 and a13ff6c.

📒 Files selected for processing (1)

crates/scout/src/tpm.rs

coderabbitai · 2026-06-19T15:14:45Z

+    if let Some(dev) = tpm_device_path(tpm_path)
+        && !std::path::Path::new(dev).exists()
+    {
+        tracing::warn!("clear_tpm: TPM device {dev} absent; skipping TPM2_Clear (host has no TPM)");
+        return Ok(());


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "tpm.rs" -type f 2>/dev/null | head -20

Repository: NVIDIA/infra-controller

Length of output: 49

🏁 Script executed:

# Locate the file in the repository find . -path "*/crates/scout/src/tpm.rs" -type f 2>/dev/null

Repository: NVIDIA/infra-controller

Length of output: 93

🏁 Script executed:

# Check repository structure ls -la crates/ 2>/dev/null | head -20

Repository: NVIDIA/infra-controller

Length of output: 1245

🏁 Script executed:

# Find the scout crate fd -i scout --type d

Repository: NVIDIA/infra-controller

Length of output: 323

🏁 Script executed:

#!/bin/bash # Search for the scout crate and tpm.rs file fd -i "tpm\.rs" -type f

Repository: NVIDIA/infra-controller

Length of output: 238

🏁 Script executed:

#!/bin/bash # List crates directory ls -1 crates/ 2>/dev/null || echo "crates dir not found"

Repository: NVIDIA/infra-controller

Length of output: 1316

🏁 Script executed:

#!/bin/bash # Search for all Rust files containing tpm_device_path or clear_tpm rg "tpm_device_path|fn clear_tpm" -l --type rust

Repository: NVIDIA/infra-controller

Length of output: 137

🏁 Script executed:

cat -n crates/scout/src/tpm.rs | head -100

Repository: NVIDIA/infra-controller

Length of output: 4662

🏁 Script executed:

# Get more context around the function cat -n crates/scout/src/tpm.rs | sed -n '50,90p'

Repository: NVIDIA/infra-controller

Length of output: 2199

🏁 Script executed:

# Check function signature and error types rg "fn clear_tpm|type.*Error|enum.*Error" crates/scout/src/tpm.rs -A 3

Repository: NVIDIA/infra-controller

Length of output: 245

🏁 Script executed:

# Check what Rust version is pinned cat rust-toolchain.toml

Repository: NVIDIA/infra-controller

Length of output: 860

🏁 Script executed:

# Check if try_exists is used elsewhere in the codebase rg "try_exists" -l

Repository: NVIDIA/infra-controller

Length of output: 160

🏁 Script executed:

# Check if Path::exists() vs try_exists() is documented or discussed rg "Path::exists|try_exists" crates/scout/src/ -B 2 -A 2

Repository: NVIDIA/infra-controller

Length of output: 49

🏁 Script executed:

# Look for tests in the tpm.rs file rg "#\[test\]|#\[cfg\(test\)\]" crates/scout/src/tpm.rs

Repository: NVIDIA/infra-controller

Length of output: 49

🏁 Script executed:

# Check if there are tests for tpm_device_path elsewhere rg "tpm_device_path" --type rust

Repository: NVIDIA/infra-controller

Length of output: 217

🏁 Script executed:

# Check the logging guidelines again from STYLE_GUIDE.md cat STYLE_GUIDE.md | grep -A 20 "## Logging"

Repository: NVIDIA/infra-controller

Length of output: 749

🏁 Script executed:

# Check if there's a specific guideline on structured fields rg "structured.*logfmt|logfmt.*structured" STYLE_GUIDE.md -B 2 -A 2

Repository: NVIDIA/infra-controller

Length of output: 251

🏁 Script executed:

# Look at how other warn logs are structured in this file rg "tracing::warn" crates/scout/src/tpm.rs -B 1 -A 1

Repository: NVIDIA/infra-controller

Length of output: 197

🏁 Script executed:

# Examine other scout files for logging patterns rg "tracing::(warn|error|info)" crates/scout/src/ -A 1 | head -40

Repository: NVIDIA/infra-controller

Length of output: 2059

Address filesystem metadata error handling and structured logging in TPM device check.

Path::exists() conflates "device not present" with IO/permission errors; on transient access failures, it returns false, causing TPM2_Clear to skip silently in deprovision. Use try_exists() to distinguish Ok(false) (absent) from Err(_) (inaccessible), and propagate errors.

Additionally, line 74 violates structured logging conventions; move the device path to a tracing field (device=%dev) rather than string interpolation for proper logfmt formatting.

The tpm_device_path() function is a pure input–output mapper (parser) and should have comprehensive table-driven test coverage per STYLE_GUIDE.md guidance.

Suggested fix

- if let Some(dev) = tpm_device_path(tpm_path) - && !std::path::Path::new(dev).exists() - { - tracing::warn!("clear_tpm: TPM device {dev} absent; skipping TPM2_Clear (host has no TPM)"); - return Ok(()); - } + if let Some(dev) = tpm_device_path(tpm_path) { + match std::path::Path::new(dev).try_exists() { + Ok(false) => { + tracing::warn!(device=%dev, "clear_tpm: TPM device absent; skipping TPM2_Clear"); + return Ok(()); + } + Ok(true) => {} + Err(e) => { + return Err(CarbideClientError::TpmError(format!( + "Failed to check TPM device path {dev}: {e}" + ))); + } + } + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/scout/src/tpm.rs` around lines 71 - 75, In the TPM device check within the function around line 71-75, replace the `Path::exists()` call with `try_exists()` to properly distinguish between a missing device (Ok(false)) and actual IO/permission errors (Err), then propagate any errors instead of silently returning Ok. Additionally, refactor the tracing::warn! call to use structured logging conventions by moving the device path from string interpolation {dev} into a tracing field using the format device=%dev. Finally, add comprehensive table-driven test coverage for the tpm_device_path() function to validate its input-output mapping behavior according to the project's STYLE_GUIDE.md testing conventions.

s3rj1k requested a review from a team as a code owner June 19, 2026 15:10

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scout): skip TPM2_Clear on hosts with no TPM device#2703

fix(scout): skip TPM2_Clear on hosts with no TPM device#2703
s3rj1k wants to merge 1 commit into
NVIDIA:mainfrom
s3rj1k:tpm

s3rj1k commented Jun 19, 2026

Uh oh!

copy-pr-bot Bot commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s3rj1k commented Jun 19, 2026

Type of Change

Breaking Changes

Testing

Additional Notes

Uh oh!

copy-pr-bot Bot commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading