Skip to content

Update API specifications with fern api update#1

Open
github-actions[bot] wants to merge 1 commit into
mainfrom
update-api-2026-04-24T17-02-28-060Z
Open

Update API specifications with fern api update#1
github-actions[bot] wants to merge 1 commit into
mainfrom
update-api-2026-04-24T17-02-28-060Z

Conversation

@github-actions

Copy link
Copy Markdown

Update API specifications by running fern api update.

krish-nvidia pushed a commit that referenced this pull request May 7, 2026
NVIDIA#1160)

## Description

  mTLS connection failures from `forge_tls_client` were logged as
  `'Unknown error', "client error (Connect)"` with no indication TLS was
  involved. Root cause: `tonic::Status` wraps the underlying transport
  error in its `source()` chain, but the standard library `Display` impl
  for errors doesn't recurse — so logging with `{}` (or `to_string()`)
  drops the rustls/hyper detail underneath.

This change adds a small private `format_error_chain` helper that walks
`std::error::Error::source()` and joins each level with `: `, then uses
  it at the four log/wrap sites in `crates/rpc/src/forge_tls_client.rs`
(per-attempt log + `Connection(String)` wrapping, in both `retry_build`
  and `retry_build_nmx_c`).

  To exercise the change locally, we created a CA mismatch as a
representative mTLS failure mode. Same client log line, before vs.
after:

  Before:
... will retry: status: 'Unknown error', self: "client error (Connect)"

  After:
... will retry: status: 'Unknown error', self: "client error (Connect)":
    client error (Connect): invalid peer certificate: UnknownIssuer

  The same code path is hit by every mTLS client of carbide-api (DHCP,
  machine-a-tron, etc.) — they all funnel through
  `ForgeTlsClient::retry_build`, so this benefits all of them.

  ## Type of Change
  - [ ] **Add** - New feature or capability
  - [x] **Change** - Changes in existing functionality
  - [ ] **Fix** - Bug fixes
  - [ ] **Remove** - Removed features or deprecated functionality
  - [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

  ## Related Issues (Optional)

  Closes NVIDIA#1088

  ## Breaking Changes
  - [ ] This PR contains breaking changes

  ## Testing
  - [x] Unit tests added/updated
  - [ ] Integration tests added/updated
  - [x] Manual testing performed
  - [ ] No testing required (docs, internal refactor, etc.)

Two new unit tests verify `format_error_chain` walks a chain and handles
  the no-source case. Manual end-to-end: reproduced the original failure
mode against `machine-a-tron` on a local kind cluster, captured before/
  after log lines (shown above).

  ## Additional Notes

  The fix is generic — it surfaces whatever the underlying error chain
  contains, not just cert errors. Other mTLS failures (handshake errors,
  expired certs, hostname mismatches) will get the same treatment.

---------

Signed-off-by: rpowers <rpowers@nvidia.com>
Co-authored-by: Alexander Korobkov <akorobkov@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants