Skip to content

bug: cert-manager Helm install does not create sandbox JWT secret #1691

@delgadof

Description

@delgadof

Agent Diagnostic

  • Loaded debug-openshell-cluster because the gateway pod was stuck on a Kubernetes Helm deployment failure.
  • Reproduced the failure path from the reported install command:
    helm upgrade --install openshell oci://ghcr.io/nvidia/openshell/helm-chart --version v0.0.54 --namespace openshell --set certManager.enabled=true --set pkiInitJob.enabled=false
  • Observed the pod event:
    MountVolume.SetUp failed for volume "sandbox-jwt" : secret "openshell-jwt-keys" not found
  • Inspected deploy/helm/openshell/templates/statefulset.yaml and found the gateway always mounts the sandbox-jwt Secret, defaulting to <fullname>-jwt-keys.
  • Inspected deploy/helm/openshell/templates/certgen.yaml and found openshell-gateway generate-certs creates the JWT signing Secret only when pkiInitJob.enabled=true.
  • Inspected deploy/helm/openshell/templates/cert-manager-pki.yaml and found cert-manager creates TLS certificate Secrets, but does not create the sandbox JWT signing Secret.
  • Inspected docs/kubernetes/managing-certificates.mdx and found the documented cert-manager install command sets certManager.enabled=true and pkiInitJob.enabled=false, which disables the only chart path that creates openshell-jwt-keys.
  • Searched existing issues for openshell-jwt-keys, pkiInitJob certManager, FailedMount sandbox-jwt, cert-manager JWT secret helm, and related terms. No exact existing report was found. The closest result was feat: support online gateway sandbox JWT key rotation #1510, a broader feature request for online sandbox JWT key rotation.

Description

Actual behavior: Installing the Helm chart with cert-manager enabled and the PKI init job disabled can leave the gateway StatefulSet unable to start because the required openshell-jwt-keys Secret is not created.

The gateway pod remains pending with:

Warning  FailedMount  kubelet  MountVolume.SetUp failed for volume "sandbox-jwt" : secret "openshell-jwt-keys" not found

Expected behavior: The documented cert-manager installation path should produce all required Secrets, including the sandbox JWT signing Secret, or fail at Helm render/install time with a clear message and instructions.

The chart currently treats pkiInitJob as both TLS PKI generation and sandbox JWT key generation. cert-manager replaces the TLS portion, but does not replace the JWT signing key portion.

Reproduction Steps

  1. Install OpenShell with the documented cert-manager override:

    helm upgrade --install openshell \
      oci://ghcr.io/nvidia/openshell/helm-chart \
      --version v0.0.54 \
      --namespace openshell \
      --set certManager.enabled=true \
      --set pkiInitJob.enabled=false
  2. Inspect the gateway pod:

    kubectl -n openshell describe pod openshell-0
  3. Observe the FailedMount event for sandbox-jwt:

    MountVolume.SetUp failed for volume "sandbox-jwt" : secret "openshell-jwt-keys" not found
  4. Confirm the JWT secret is absent:

    kubectl -n openshell get secret openshell-jwt-keys

Environment

  • OpenShell Helm chart: v0.0.54
  • Deployment mode: Kubernetes Helm install
  • Certificate mode: certManager.enabled=true, pkiInitJob.enabled=false
  • Namespace: openshell
  • Kubernetes distribution: not yet confirmed

Logs

Normal   Scheduled    default-scheduler  Successfully assigned openshell/openshell-0 to brev-bw1x8gq9y
Warning  FailedMount  kubelet            MountVolume.SetUp failed for volume "sandbox-jwt" : secret "openshell-jwt-keys" not found

Suggested Fix

Separate sandbox JWT key provisioning from TLS certificate provisioning. Possible approaches:

  • Add a dedicated Helm hook/job for openshell-jwt-keys that runs even when certManager.enabled=true and pkiInitJob.enabled=false.
  • Add a chart value for bringing a pre-created JWT Secret and validate it more explicitly.
  • Add a Helm render-time failure when cert-manager mode is selected without a JWT Secret provisioning path.
  • Update docs/kubernetes/managing-certificates.mdx to include the required JWT secret behavior or manual secret creation step until the chart is fixed.

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

Labels

area:clusterRelated to running OpenShell on k3s/dockerarea:docsDocumentation and examples

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions