fix(scout): skip TPM2_Clear on hosts with no TPM device#2703
Conversation
clear_tpm ran TPM2_Clear unconditionally during discovery deprovision (run_no_api -> clear_tpm), which hard-fails on a host with no TPM (/dev/tpmrm0 absent): the scout logs it non-fatally and re-polls, so discovery never reports DiscoverMachine and the machine loops in WaitingForDiscovery. register::run already tolerates no-TPM hosts via is_dpu (absent EK cert gates all attestation), so clear_tpm was the only unconditional TPM call blocking discovery. Map the TCTI to its device file and skip the clear when that device is absent; a present TPM that fails to clear stays a hard error. Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
Summary by CodeRabbit
Walkthrough
ChangesTPM Device Existence Guard
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/scout/src/tpm.rs (2)
74-74: ⚡ Quick winUse structured tracing fields in the warning log.
Prefer key/value fields (e.g.,
device=%dev) instead of interpolation to keep logs logfmt-friendly and queryable.As per coding guidelines, "All services should emit logs in 'logfmt' syntax" and "prefer placing common fields as attributes passed to tracing functions instead of using string interpolation."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/scout/src/tpm.rs` at line 74, The tracing::warn! call in the clear_tpm function is using string interpolation to embed the dev variable into the log message, which is not logfmt-friendly. Replace the string interpolation with structured tracing fields by moving the dev variable outside the message string and passing it as a key/value field argument to tracing::warn! using the format device=%dev, keeping only descriptive text in the message string itself.Source: Coding guidelines
60-65: ⚡ Quick winAdd table-driven tests for TCTI parsing and skip/clear branching.
tpm_device_pathand the new early-return guard are input-mapping logic and should be covered with table-driven cases (device:/dev/tpmrm0, plain/dev/tpmrm0, socket TCTIs, malformed strings, absent/present device outcomes).As per coding guidelines, "Prefer table-driven tests for any function that maps inputs to outputs, errors, or other observable results" and "Reach for a table whenever two or more tests call the same operation with different inputs."
Also applies to: 67-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/scout/src/tpm.rs` around lines 60 - 65, The tpm_device_path function lacks comprehensive test coverage for its input-to-output mapping logic. Add table-driven tests that cover multiple input scenarios including device TCTI paths like device:/dev/tpmrm0, plain /dev/tpmrm0 paths, socket TCTIs that should return None, malformed strings, and cases with absent/present device outcomes. Apply the same table-driven testing approach to the early-return guard logic (lines 67-76) as these are also input-mapping functions that benefit from systematic case coverage per coding guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/scout/src/tpm.rs`:
- Around line 71-75: In the TPM device check within the function around line
71-75, replace the `Path::exists()` call with `try_exists()` to properly
distinguish between a missing device (Ok(false)) and actual IO/permission errors
(Err), then propagate any errors instead of silently returning Ok. Additionally,
refactor the tracing::warn! call to use structured logging conventions by moving
the device path from string interpolation {dev} into a tracing field using the
format device=%dev. Finally, add comprehensive table-driven test coverage for
the tpm_device_path() function to validate its input-output mapping behavior
according to the project's STYLE_GUIDE.md testing conventions.
---
Nitpick comments:
In `@crates/scout/src/tpm.rs`:
- Line 74: The tracing::warn! call in the clear_tpm function is using string
interpolation to embed the dev variable into the log message, which is not
logfmt-friendly. Replace the string interpolation with structured tracing fields
by moving the dev variable outside the message string and passing it as a
key/value field argument to tracing::warn! using the format device=%dev, keeping
only descriptive text in the message string itself.
- Around line 60-65: The tpm_device_path function lacks comprehensive test
coverage for its input-to-output mapping logic. Add table-driven tests that
cover multiple input scenarios including device TCTI paths like
device:/dev/tpmrm0, plain /dev/tpmrm0 paths, socket TCTIs that should return
None, malformed strings, and cases with absent/present device outcomes. Apply
the same table-driven testing approach to the early-return guard logic (lines
67-76) as these are also input-mapping functions that benefit from systematic
case coverage per coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 15fc4475-46a0-41d5-b440-70605589fa0c
📒 Files selected for processing (1)
crates/scout/src/tpm.rs
| if let Some(dev) = tpm_device_path(tpm_path) | ||
| && !std::path::Path::new(dev).exists() | ||
| { | ||
| tracing::warn!("clear_tpm: TPM device {dev} absent; skipping TPM2_Clear (host has no TPM)"); | ||
| return Ok(()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "tpm.rs" -type f 2>/dev/null | head -20Repository: NVIDIA/infra-controller
Length of output: 49
🏁 Script executed:
# Locate the file in the repository
find . -path "*/crates/scout/src/tpm.rs" -type f 2>/dev/nullRepository: NVIDIA/infra-controller
Length of output: 93
🏁 Script executed:
# Check repository structure
ls -la crates/ 2>/dev/null | head -20Repository: NVIDIA/infra-controller
Length of output: 1245
🏁 Script executed:
# Find the scout crate
fd -i scout --type dRepository: NVIDIA/infra-controller
Length of output: 323
🏁 Script executed:
#!/bin/bash
# Search for the scout crate and tpm.rs file
fd -i "tpm\.rs" -type fRepository: NVIDIA/infra-controller
Length of output: 238
🏁 Script executed:
#!/bin/bash
# List crates directory
ls -1 crates/ 2>/dev/null || echo "crates dir not found"Repository: NVIDIA/infra-controller
Length of output: 1316
🏁 Script executed:
#!/bin/bash
# Search for all Rust files containing tpm_device_path or clear_tpm
rg "tpm_device_path|fn clear_tpm" -l --type rustRepository: NVIDIA/infra-controller
Length of output: 137
🏁 Script executed:
cat -n crates/scout/src/tpm.rs | head -100Repository: NVIDIA/infra-controller
Length of output: 4662
🏁 Script executed:
# Get more context around the function
cat -n crates/scout/src/tpm.rs | sed -n '50,90p'Repository: NVIDIA/infra-controller
Length of output: 2199
🏁 Script executed:
# Check function signature and error types
rg "fn clear_tpm|type.*Error|enum.*Error" crates/scout/src/tpm.rs -A 3Repository: NVIDIA/infra-controller
Length of output: 245
🏁 Script executed:
# Check what Rust version is pinned
cat rust-toolchain.tomlRepository: NVIDIA/infra-controller
Length of output: 860
🏁 Script executed:
# Check if try_exists is used elsewhere in the codebase
rg "try_exists" -lRepository: NVIDIA/infra-controller
Length of output: 160
🏁 Script executed:
# Check if Path::exists() vs try_exists() is documented or discussed
rg "Path::exists|try_exists" crates/scout/src/ -B 2 -A 2Repository: NVIDIA/infra-controller
Length of output: 49
🏁 Script executed:
# Look for tests in the tpm.rs file
rg "#\[test\]|#\[cfg\(test\)\]" crates/scout/src/tpm.rsRepository: NVIDIA/infra-controller
Length of output: 49
🏁 Script executed:
# Check if there are tests for tpm_device_path elsewhere
rg "tpm_device_path" --type rustRepository: NVIDIA/infra-controller
Length of output: 217
🏁 Script executed:
# Check the logging guidelines again from STYLE_GUIDE.md
cat STYLE_GUIDE.md | grep -A 20 "## Logging"Repository: NVIDIA/infra-controller
Length of output: 749
🏁 Script executed:
# Check if there's a specific guideline on structured fields
rg "structured.*logfmt|logfmt.*structured" STYLE_GUIDE.md -B 2 -A 2Repository: NVIDIA/infra-controller
Length of output: 251
🏁 Script executed:
# Look at how other warn logs are structured in this file
rg "tracing::warn" crates/scout/src/tpm.rs -B 1 -A 1Repository: NVIDIA/infra-controller
Length of output: 197
🏁 Script executed:
# Examine other scout files for logging patterns
rg "tracing::(warn|error|info)" crates/scout/src/ -A 1 | head -40Repository: NVIDIA/infra-controller
Length of output: 2059
Address filesystem metadata error handling and structured logging in TPM device check.
Path::exists() conflates "device not present" with IO/permission errors; on transient access failures, it returns false, causing TPM2_Clear to skip silently in deprovision. Use try_exists() to distinguish Ok(false) (absent) from Err(_) (inaccessible), and propagate errors.
Additionally, line 74 violates structured logging conventions; move the device path to a tracing field (device=%dev) rather than string interpolation for proper logfmt formatting.
The tpm_device_path() function is a pure input–output mapper (parser) and should have comprehensive table-driven test coverage per STYLE_GUIDE.md guidance.
Suggested fix
- if let Some(dev) = tpm_device_path(tpm_path)
- && !std::path::Path::new(dev).exists()
- {
- tracing::warn!("clear_tpm: TPM device {dev} absent; skipping TPM2_Clear (host has no TPM)");
- return Ok(());
- }
+ if let Some(dev) = tpm_device_path(tpm_path) {
+ match std::path::Path::new(dev).try_exists() {
+ Ok(false) => {
+ tracing::warn!(device=%dev, "clear_tpm: TPM device absent; skipping TPM2_Clear");
+ return Ok(());
+ }
+ Ok(true) => {}
+ Err(e) => {
+ return Err(CarbideClientError::TpmError(format!(
+ "Failed to check TPM device path {dev}: {e}"
+ )));
+ }
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/scout/src/tpm.rs` around lines 71 - 75, In the TPM device check within
the function around line 71-75, replace the `Path::exists()` call with
`try_exists()` to properly distinguish between a missing device (Ok(false)) and
actual IO/permission errors (Err), then propagate any errors instead of silently
returning Ok. Additionally, refactor the tracing::warn! call to use structured
logging conventions by moving the device path from string interpolation {dev}
into a tracing field using the format device=%dev. Finally, add comprehensive
table-driven test coverage for the tpm_device_path() function to validate its
input-output mapping behavior according to the project's STYLE_GUIDE.md testing
conventions.
clear_tpm ran TPM2_Clear unconditionally during discovery deprovision (run_no_api -> clear_tpm), which hard-fails on a host with no TPM (/dev/tpmrm0 absent): the scout logs it non-fatally and re-polls, so discovery never reports DiscoverMachine and the machine loops in WaitingForDiscovery. register::run already tolerates no-TPM hosts via is_dpu (absent EK cert gates all attestation), so clear_tpm was the only unconditional TPM call blocking discovery. Map the TCTI to its device file and skip the clear when that device is absent; a present TPM that fails to clear stays a hard error.
Type of Change
Breaking Changes
Testing
Additional Notes