diff --git a/.agents/skills/debug-openshell-cluster/SKILL.md b/.agents/skills/debug-openshell-cluster/SKILL.md index aeaa503f7..b65bf26d2 100644 --- a/.agents/skills/debug-openshell-cluster/SKILL.md +++ b/.agents/skills/debug-openshell-cluster/SKILL.md @@ -157,6 +157,15 @@ kubectl -n openshell get secret \ openshell-jwt-keys ``` +In cert-manager installs, `certManager.enabled=true` makes cert-manager own TLS +generation. The Helm chart should still render the `openshell-certgen` +pre-install/pre-upgrade hook in JWT-only mode to create `openshell-jwt-keys`, +even if `pkiInitJob.enabled` remains true. +If the gateway pod is pending with `MountVolume.SetUp failed for volume +"sandbox-jwt"` and `openshell-jwt-keys` is absent, inspect the rendered +`templates/certgen.yaml` output and the hook Job logs; cert-manager creates TLS +Secrets but does not create the sandbox JWT signing Secret. + If the gateway exits with `failed to read sandbox JWT signing key from /etc/openshell-jwt/signing.pem`, verify that `openshell-jwt-keys` contains `signing.pem`, `public.pem`, and `kid`, and that the StatefulSet mounts the diff --git a/architecture/gateway.md b/architecture/gateway.md index 533ce04a5..5bb81e5d3 100644 --- a/architecture/gateway.md +++ b/architecture/gateway.md @@ -363,32 +363,39 @@ requested for that relay. ## PKI Bootstrap -`openshell-gateway generate-certs` is the one place mTLS materials are -created. Both deployment paths use it: +`openshell-gateway generate-certs` is the one place local mTLS materials and +sandbox JWT signing material are created. Deployment paths use it as follows: | Output mode | Selector | Layout | |---|---|---| -| Kubernetes Secrets | (default) `--namespace`, `--server-secret-name`, `--client-secret-name` | Two `kubernetes.io/tls` Secrets with `tls.crt` / `tls.key` / `ca.crt`. | -| Filesystem | `--output-dir ` | `/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}}`. Also copies client materials to `$XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/` for CLI auto-discovery. | +| Kubernetes Secrets | (default) `--namespace`, `--server-secret-name`, `--client-secret-name`, `--jwt-secret-name` | Two `kubernetes.io/tls` Secrets with `tls.crt` / `tls.key` / `ca.crt` plus one Opaque sandbox JWT Secret with `signing.pem` / `public.pem` / `kid`. | +| Kubernetes JWT-only Secret | `--namespace`, `--jwt-only`, `--jwt-secret-name` | One Opaque sandbox JWT Secret with `signing.pem` / `public.pem` / `kid`. | +| Filesystem | `--output-dir ` | `/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}, jwt/{signing.pem,public.pem,kid}}`. Also copies client materials to `$XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/` for CLI auto-discovery. | On Kubernetes, the Helm chart runs the command via a pre-install/pre-upgrade hook Job using the gateway image itself -- no separate cert-generation image, -no extra mirror burden in air-gapped environments. On package-managed local -gateways, the same command runs from the systemd unit's `ExecStartPre` to -bootstrap PKI into the configured local TLS directory on first start. The -Linux package unit defaults that directory to `~/.local/state/openshell/tls` -through `OPENSHELL_LOCAL_TLS_DIR` so certificate generation and runtime -auto-detection use the same path across systemd versions. - -Both modes share the same idempotency contract: all targets present -> skip; -partial state -> fail with a recovery hint; nothing present -> generate and -write. This guards mTLS continuity across restarts and upgrades while still -recovering cleanly if an operator deletes everything and starts over. - -Operators who manage PKI externally (cert-manager, an enterprise CA, or -pre-created Secrets) disable the Helm hook via `pkiInitJob.enabled=false`. -The chart also ships a `certManager.*` path that produces equivalent Secrets -through cert-manager `Issuer`/`Certificate` resources. +no extra mirror burden in air-gapped environments. In the default built-in PKI +path the hook creates TLS and sandbox JWT Secrets. When cert-manager is enabled, +cert-manager owns TLS Secrets and the hook runs with `--jwt-only` so the +required sandbox JWT Secret still exists before the gateway StatefulSet mounts +it, even if `pkiInitJob.enabled` remains true. On package-managed local +gateways, the same command runs from the systemd +unit's `ExecStartPre` to bootstrap PKI into the configured local TLS directory +on first start. The Linux package unit defaults that directory to +`~/.local/state/openshell/tls` through `OPENSHELL_LOCAL_TLS_DIR` so certificate +generation and runtime auto-detection use the same path across systemd +versions. + +The bootstrap paths share the same idempotency contract: all requested targets +present -> skip; partial requested state -> fail with a recovery hint; nothing +requested present -> generate and write. This guards continuity across restarts +and upgrades while still recovering cleanly if an operator deletes everything +and starts over. + +Operators who manage TLS PKI with cert-manager enable `certManager.enabled`; +cert-manager takes precedence over built-in TLS generation and the chart still +renders the JWT-only hook. Operators who pre-create all TLS and JWT Secrets can +disable both `pkiInitJob.enabled` and `certManager.enabled`. ## Configuration diff --git a/crates/openshell-server/src/certgen.rs b/crates/openshell-server/src/certgen.rs index b7ce0421c..683ed180f 100644 --- a/crates/openshell-server/src/certgen.rs +++ b/crates/openshell-server/src/certgen.rs @@ -6,8 +6,12 @@ //! Two output modes, dispatched by the presence of `--output-dir`: //! //! - **Kubernetes mode** (default): create two `kubernetes.io/tls` Secrets -//! in the supplied namespace. Used by the Helm pre-install hook. Requires -//! `--namespace`, `--server-secret-name`, `--client-secret-name`. +//! and one sandbox-JWT signing Secret in the supplied namespace. Used by +//! the Helm pre-install hook. Requires `--namespace`, +//! `--server-secret-name`, `--client-secret-name`, and `--jwt-secret-name`. +//! - **Kubernetes JWT-only mode** (`--jwt-only`): create only the +//! sandbox-JWT signing Secret. Used when another controller, such as +//! cert-manager, owns the TLS Secrets. //! - **Local mode** (`--output-dir `): write PEMs to the local package //! filesystem layout. Used by systemd units' `ExecStartPre`. Also copies //! client materials to @@ -47,11 +51,11 @@ pub struct CertgenArgs { namespace: Option, /// Name of the server TLS Secret (`kubernetes.io/tls`) to create. - #[arg(long, required_unless_present = "output_dir")] + #[arg(long, required_unless_present_any = ["output_dir", "jwt_only"])] server_secret_name: Option, /// Name of the client TLS Secret (`kubernetes.io/tls`) to create. - #[arg(long, required_unless_present = "output_dir")] + #[arg(long, required_unless_present_any = ["output_dir", "jwt_only"])] client_secret_name: Option, /// Name of the sandbox-JWT signing-key Secret (`Opaque`) to create. @@ -60,6 +64,11 @@ pub struct CertgenArgs { #[arg(long, required_unless_present = "output_dir")] jwt_secret_name: Option, + /// Create only the sandbox-JWT signing-key Secret in Kubernetes mode. + /// This is used when another controller owns TLS Secret provisioning. + #[arg(long, conflicts_with = "output_dir")] + jwt_only: bool, + /// Extra Subject Alternative Name for the server certificate. Repeatable. /// Auto-detected as an IP address or DNS name. #[arg(long = "server-san", value_name = "SAN")] @@ -116,6 +125,51 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { .namespace .as_deref() .ok_or_else(|| miette::miette!("--namespace is required (or set POD_NAMESPACE)"))?; + + let client = Client::try_default() + .await + .into_diagnostic() + .wrap_err("failed to construct in-cluster Kubernetes client")?; + let api: Api = Api::namespaced(client, namespace); + + if args.jwt_only { + let jwt_name = args + .jwt_secret_name + .as_deref() + .ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?; + let jwt_exists = api + .get_opt(jwt_name) + .await + .into_diagnostic() + .wrap_err_with(|| format!("failed to read secret {jwt_name}"))? + .is_some(); + if jwt_exists { + info!( + namespace = %namespace, + jwt = %jwt_name, + "JWT signing secret already exists, skipping." + ); + return Ok(()); + } + + let jwt_secret = jwt_signing_secret( + jwt_name, + &bundle.jwt_signing_key_pem, + &bundle.jwt_public_key_pem, + &bundle.jwt_key_id, + ); + api.create(&PostParams::default(), &jwt_secret) + .await + .into_diagnostic() + .wrap_err_with(|| format!("failed to create secret {jwt_name}"))?; + info!( + namespace = %namespace, + jwt = %jwt_name, + "JWT signing secret created." + ); + return Ok(()); + } + let server_name = args .server_secret_name .as_deref() @@ -124,17 +178,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { .client_secret_name .as_deref() .ok_or_else(|| miette::miette!("--client-secret-name is required"))?; - let jwt_name = args - .jwt_secret_name - .as_deref() - .ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?; - - let client = Client::try_default() - .await - .into_diagnostic() - .wrap_err("failed to construct in-cluster Kubernetes client")?; - let api: Api = Api::namespaced(client, namespace); - let server_exists = api .get_opt(server_name) .await @@ -147,6 +190,11 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { .into_diagnostic() .wrap_err_with(|| format!("failed to read secret {client_name}"))? .is_some(); + + let jwt_name = args + .jwt_secret_name + .as_deref() + .ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?; let jwt_exists = api .get_opt(jwt_name) .await @@ -193,6 +241,34 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { K8sAction::CreateAll => {} } + create_tls_secrets(&api, server_name, client_name, bundle).await?; + let jwt_secret = jwt_signing_secret( + jwt_name, + &bundle.jwt_signing_key_pem, + &bundle.jwt_public_key_pem, + &bundle.jwt_key_id, + ); + api.create(&PostParams::default(), &jwt_secret) + .await + .into_diagnostic() + .wrap_err_with(|| format!("failed to create secret {jwt_name}"))?; + + info!( + namespace = %namespace, + server = %server_name, + client = %client_name, + jwt = %jwt_name, + "PKI secrets created." + ); + Ok(()) +} + +async fn create_tls_secrets( + api: &Api, + server_name: &str, + client_name: &str, + bundle: &PkiBundle, +) -> Result<()> { let server_secret = tls_secret( server_name, &bundle.server_cert_pem, @@ -205,12 +281,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { &bundle.client_key_pem, &bundle.ca_cert_pem, ); - let jwt_secret = jwt_signing_secret( - jwt_name, - &bundle.jwt_signing_key_pem, - &bundle.jwt_public_key_pem, - &bundle.jwt_key_id, - ); api.create(&PostParams::default(), &server_secret) .await @@ -220,18 +290,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> { .await .into_diagnostic() .wrap_err_with(|| format!("failed to create secret {client_name}"))?; - api.create(&PostParams::default(), &jwt_secret) - .await - .into_diagnostic() - .wrap_err_with(|| format!("failed to create secret {jwt_name}"))?; - - info!( - namespace = %namespace, - server = %server_name, - client = %client_name, - jwt = %jwt_name, - "PKI secrets created." - ); Ok(()) } diff --git a/crates/openshell-server/src/cli.rs b/crates/openshell-server/src/cli.rs index b8d345f9e..9fb0ba339 100644 --- a/crates/openshell-server/src/cli.rs +++ b/crates/openshell-server/src/cli.rs @@ -979,6 +979,31 @@ mod tests { )); } + #[test] + fn generate_certs_jwt_only_parses_without_tls_secret_names() { + let _lock = ENV_LOCK + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner); + let _g1 = EnvVarGuard::remove("OPENSHELL_DB_URL"); + let _g2 = EnvVarGuard::remove("POD_NAMESPACE"); + + let cli = Cli::try_parse_from([ + "openshell-gateway", + "generate-certs", + "--namespace", + "openshell", + "--jwt-only", + "--jwt-secret-name", + "openshell-jwt-keys", + ]) + .expect("--jwt-only should make TLS secret-name flags optional"); + + assert!(matches!( + cli.command, + Some(super::Commands::GenerateCerts(_)) + )); + } + #[test] fn bare_invocation_with_no_db_url_parses_for_runtime_defaults() { // db_url is Option at the clap level so subcommand parsing diff --git a/deploy/helm/openshell/README.md b/deploy/helm/openshell/README.md index 15c9f7f85..62b313e0b 100644 --- a/deploy/helm/openshell/README.md +++ b/deploy/helm/openshell/README.md @@ -109,22 +109,19 @@ Append these flags to any of the PostgreSQL commands above for OpenShift: --set securityContext.runAsUser=null ``` -## PKI bootstrap +## Secret bootstrap By default, a pre-install/pre-upgrade hook Job runs `openshell-gateway generate-certs` -to create the gateway's server and client mTLS Secrets. The Job uses the gateway image -itself, so air-gapped environments only need to mirror that one image (no separate -openssl/alpine sidecar). +to create the gateway's server/client mTLS Secrets and sandbox JWT signing Secret. +The Job uses the gateway image itself, so air-gapped environments only need to +mirror that one image (no separate openssl/alpine sidecar). -The Job is idempotent: - -- Both target Secrets exist: log and exit 0. -- Exactly one exists: fail with `kubectl delete secret -n ` recovery hint. -- Neither exists: generate a CA, server cert, and client cert; POST both `kubernetes.io/tls` Secrets (`tls.crt`, `tls.key`, `ca.crt`). - -Disable with `--set pkiInitJob.enabled=false` when bringing your own PKI (cert-manager, -external CA, or pre-created Secrets). See `certManager.*` in `values.yaml` for the -cert-manager alternative. +When `certManager.enabled=true`, cert-manager owns the TLS Secrets and the chart +runs the same hook in JWT-only mode because cert-manager does not create the +sandbox JWT signing Secret. This precedence applies even if +`pkiInitJob.enabled` remains true. Set `pkiInitJob.enabled=false` only when an +external non-cert-manager TLS source manages TLS and you pre-create the sandbox +JWT signing Secret. ## Values @@ -135,7 +132,7 @@ cert-manager alternative. | certManager.certificateDuration | string | `"8760h"` | Duration for cert-manager-issued certificates. | | certManager.certificateRenewBefore | string | `"720h"` | Renewal window for cert-manager-issued certificates. | | certManager.clientCaFromServerTlsSecret | bool | `true` | Mount gateway client CA from the server TLS secret's ca.crt (populated by cert-manager for certs issued by a CA Issuer). Avoids a separate openshell-server-client-ca Secret. | -| certManager.enabled | bool | `false` | Create cert-manager Issuer and Certificate resources instead of using the PKI bootstrap Job. | +| certManager.enabled | bool | `false` | Create cert-manager Issuer and Certificate resources. When enabled, cert-manager owns TLS and the chart runs a JWT-only certgen hook to create the sandbox JWT signing Secret that cert-manager does not manage. | | certManager.serverDnsNames | list | `["openshell","openshell.openshell.svc","openshell.openshell.svc.cluster.local","localhost","openshell.localhost","*.openshell.localhost","host.docker.internal"]` | DNS SANs on the cert-manager-issued server certificate. | | certManager.serverIpAddresses | list | `["127.0.0.1"]` | IP SANs on the cert-manager-issued server certificate. | | fullnameOverride | string | `""` | Override the full generated resource name. | @@ -155,7 +152,7 @@ cert-manager alternative. | nameOverride | string | `"openshell"` | Override the chart name used in generated resource names. | | networkPolicy.enabled | bool | `true` | Create a NetworkPolicy restricting SSH ingress on sandbox pods to the gateway. | | nodeSelector | object | `{}` | Node selector for the gateway pod. | -| pkiInitJob.enabled | bool | `true` | Run a pre-install/pre-upgrade Job that creates gateway and client mTLS Secrets. | +| pkiInitJob.enabled | bool | `true` | Run a pre-install/pre-upgrade Job that creates gateway and client mTLS Secrets. When certManager.enabled=true, cert-manager owns TLS and this same hook runs in JWT-only mode even if pkiInitJob.enabled remains true. | | pkiInitJob.serverDnsNames | list | `[]` | Extra DNS SANs to append to the server certificate. | | pkiInitJob.serverIpAddresses | list | `[]` | Extra IP SANs to append to the server certificate. | | podAnnotations | object | `{}` | Extra annotations to add to the gateway pod. | diff --git a/deploy/helm/openshell/README.md.gotmpl b/deploy/helm/openshell/README.md.gotmpl index fc391a416..5fc4019e8 100644 --- a/deploy/helm/openshell/README.md.gotmpl +++ b/deploy/helm/openshell/README.md.gotmpl @@ -109,22 +109,19 @@ Append these flags to any of the PostgreSQL commands above for OpenShift: --set securityContext.runAsUser=null ``` -## PKI bootstrap +## Secret bootstrap By default, a pre-install/pre-upgrade hook Job runs `openshell-gateway generate-certs` -to create the gateway's server and client mTLS Secrets. The Job uses the gateway image -itself, so air-gapped environments only need to mirror that one image (no separate -openssl/alpine sidecar). - -The Job is idempotent: - -- Both target Secrets exist: log and exit 0. -- Exactly one exists: fail with `kubectl delete secret -n ` recovery hint. -- Neither exists: generate a CA, server cert, and client cert; POST both `kubernetes.io/tls` Secrets (`tls.crt`, `tls.key`, `ca.crt`). - -Disable with `--set pkiInitJob.enabled=false` when bringing your own PKI (cert-manager, -external CA, or pre-created Secrets). See `certManager.*` in `values.yaml` for the -cert-manager alternative. +to create the gateway's server/client mTLS Secrets and sandbox JWT signing Secret. +The Job uses the gateway image itself, so air-gapped environments only need to +mirror that one image (no separate openssl/alpine sidecar). + +When `certManager.enabled=true`, cert-manager owns the TLS Secrets and the chart +runs the same hook in JWT-only mode because cert-manager does not create the +sandbox JWT signing Secret. This precedence applies even if +`pkiInitJob.enabled` remains true. Set `pkiInitJob.enabled=false` only when an +external non-cert-manager TLS source manages TLS and you pre-create the sandbox +JWT signing Secret. {{ template "chart.valuesSection" . }} {{ template "helm-docs.versionFooter" . }} diff --git a/deploy/helm/openshell/ci/values-cert-manager.yaml b/deploy/helm/openshell/ci/values-cert-manager.yaml index ed99c8b46..2d159c176 100644 --- a/deploy/helm/openshell/ci/values-cert-manager.yaml +++ b/deploy/helm/openshell/ci/values-cert-manager.yaml @@ -7,8 +7,5 @@ server: disableTls: false -pkiInitJob: - enabled: false - certManager: enabled: true diff --git a/deploy/helm/openshell/skaffold.yaml b/deploy/helm/openshell/skaffold.yaml index dcf578d22..9a056238a 100644 --- a/deploy/helm/openshell/skaffold.yaml +++ b/deploy/helm/openshell/skaffold.yaml @@ -60,7 +60,7 @@ deploy: helm: releases: # cert-manager — comment this in and add values-cert-manager.yaml below - # when you want cert-manager to manage the PKI instead of pkiInitJob. + # when you want cert-manager to manage TLS while certgen handles JWT. # Requires cert-manager CRDs to be installed in the cluster first. #- name: cert-manager # repo: https://charts.jetstack.io @@ -89,7 +89,7 @@ deploy: - values.yaml - ci/values-skaffold.yaml # Add ci/values-cert-manager.yaml here (and uncomment the cert-manager - # release above) to switch from pkiInitJob to cert-manager for PKI. + # release above) to switch TLS generation to cert-manager. #- ci/values-cert-manager.yaml # To enable OIDC with a local Keycloak instance, run the one-time # setup task first, then uncomment the line below: diff --git a/deploy/helm/openshell/templates/_helpers.tpl b/deploy/helm/openshell/templates/_helpers.tpl index c40be9b80..a8e7ac721 100644 --- a/deploy/helm/openshell/templates/_helpers.tpl +++ b/deploy/helm/openshell/templates/_helpers.tpl @@ -133,6 +133,13 @@ Name of the Secret holding the PostgreSQL connection URI. {{- end -}} {{- end }} +{{/* +Name of the Secret holding gateway-minted sandbox JWT signing material. +*/}} +{{- define "openshell.sandboxJwtSecretName" -}} +{{- .Values.server.sandboxJwt.signingSecretName | default (printf "%s-jwt-keys" (include "openshell.fullname" .)) -}} +{{- end }} + {{/* gRPC endpoint sandbox pods use to call back into the gateway. An explicit .Values.server.grpcEndpoint is used verbatim. Otherwise it is derived from diff --git a/deploy/helm/openshell/templates/certgen.yaml b/deploy/helm/openshell/templates/certgen.yaml index 61203760b..3651e188f 100644 --- a/deploy/helm/openshell/templates/certgen.yaml +++ b/deploy/helm/openshell/templates/certgen.yaml @@ -1,10 +1,7 @@ # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 -{{- if and .Values.pkiInitJob.enabled .Values.certManager.enabled }} -{{- fail "pkiInitJob.enabled and certManager.enabled cannot both be true; disable one to avoid conflicting PKI sources." }} -{{- end }} -{{- if .Values.pkiInitJob.enabled }} +{{- if or .Values.pkiInitJob.enabled .Values.certManager.enabled }} {{- $hookName := printf "%s-certgen" (include "openshell.fullname" .) }} {{- $ns := .Release.Namespace }} apiVersion: v1 @@ -98,13 +95,19 @@ spec: command: ["/usr/local/bin/openshell-gateway"] args: - generate-certs + {{- if .Values.certManager.enabled }} + - --jwt-only + {{- else }} - --server-secret-name={{ .Values.server.tls.certSecretName }} - --client-secret-name={{ .Values.server.tls.clientTlsSecretName }} - - --jwt-secret-name={{ .Values.server.sandboxJwt.signingSecretName | default (printf "%s-jwt-keys" (include "openshell.fullname" .)) }} + {{- end }} + - --jwt-secret-name={{ include "openshell.sandboxJwtSecretName" . }} + {{- if and .Values.pkiInitJob.enabled (not .Values.certManager.enabled) }} {{- range .Values.pkiInitJob.serverDnsNames }} - --server-san={{ . }} {{- end }} {{- range .Values.pkiInitJob.serverIpAddresses }} - --server-san={{ . }} {{- end }} + {{- end }} {{- end }} diff --git a/deploy/helm/openshell/templates/statefulset.yaml b/deploy/helm/openshell/templates/statefulset.yaml index 087748d38..d56c06355 100644 --- a/deploy/helm/openshell/templates/statefulset.yaml +++ b/deploy/helm/openshell/templates/statefulset.yaml @@ -93,7 +93,7 @@ spec: - name: tls-cert mountPath: /etc/openshell-tls/server readOnly: true - {{- if or .Values.server.tls.clientCaSecretName .Values.pkiInitJob.enabled (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} + {{- if or .Values.server.tls.clientCaSecretName (and .Values.pkiInitJob.enabled (not .Values.certManager.enabled)) (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} - name: tls-client-ca mountPath: /etc/openshell-tls/client-ca readOnly: true @@ -147,16 +147,16 @@ spec: name: {{ include "openshell.fullname" . }}-config - name: sandbox-jwt secret: - secretName: {{ .Values.server.sandboxJwt.signingSecretName | default (printf "%s-jwt-keys" (include "openshell.fullname" .)) }} + secretName: {{ include "openshell.sandboxJwtSecretName" . }} defaultMode: {{ .Values.server.sandboxJwt.secretDefaultMode | default 0400 }} {{- if not .Values.server.disableTls }} - name: tls-cert secret: secretName: {{ .Values.server.tls.certSecretName }} - {{- if or .Values.server.tls.clientCaSecretName .Values.pkiInitJob.enabled (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} + {{- if or .Values.server.tls.clientCaSecretName (and .Values.pkiInitJob.enabled (not .Values.certManager.enabled)) (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} - name: tls-client-ca secret: - {{- if or .Values.pkiInitJob.enabled (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} + {{- if or (and .Values.pkiInitJob.enabled (not .Values.certManager.enabled)) (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} secretName: {{ .Values.server.tls.certSecretName }} items: - key: ca.crt diff --git a/deploy/helm/openshell/tests/certgen_test.yaml b/deploy/helm/openshell/tests/certgen_test.yaml new file mode 100644 index 000000000..cd88b60e9 --- /dev/null +++ b/deploy/helm/openshell/tests/certgen_test.yaml @@ -0,0 +1,116 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +suite: certgen hook +templates: + - templates/certgen.yaml +release: + name: openshell + namespace: my-namespace + +tests: + - it: renders combined TLS and JWT certgen hook by default + template: templates/certgen.yaml + asserts: + - hasDocuments: + count: 4 + - equal: + path: kind + value: Job + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--server-secret-name=openshell-server-tls" + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--client-secret-name=openshell-client-tls" + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-secret-name=openshell-jwt-keys" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--jwt-only" + documentIndex: 3 + + - it: renders JWT-only certgen hook when cert-manager owns TLS + template: templates/certgen.yaml + set: + certManager.enabled: true + pkiInitJob.enabled: false + asserts: + - hasDocuments: + count: 4 + - equal: + path: kind + value: Job + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-only" + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-secret-name=openshell-jwt-keys" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--server-secret-name=openshell-server-tls" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--client-secret-name=openshell-client-tls" + documentIndex: 3 + + - it: uses the configured sandbox JWT secret name + template: templates/certgen.yaml + set: + server.sandboxJwt.signingSecretName: custom-jwt-keys + asserts: + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-secret-name=custom-jwt-keys" + documentIndex: 3 + + - it: renders JWT-only hook when cert-manager is enabled even if pkiInitJob remains enabled + template: templates/certgen.yaml + set: + certManager.enabled: true + pkiInitJob.enabled: true + pkiInitJob.serverDnsNames: + - extra.example.test + pkiInitJob.serverIpAddresses: + - 192.0.2.10 + asserts: + - hasDocuments: + count: 4 + - equal: + path: kind + value: Job + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-only" + documentIndex: 3 + - contains: + path: spec.template.spec.containers[0].args + content: "--jwt-secret-name=openshell-jwt-keys" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--server-secret-name=openshell-server-tls" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--client-secret-name=openshell-client-tls" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--server-san=extra.example.test" + documentIndex: 3 + - notContains: + path: spec.template.spec.containers[0].args + content: "--server-san=192.0.2.10" + documentIndex: 3 diff --git a/deploy/helm/openshell/tests/statefulset_client_ca_test.yaml b/deploy/helm/openshell/tests/statefulset_client_ca_test.yaml new file mode 100644 index 000000000..a7b02310c --- /dev/null +++ b/deploy/helm/openshell/tests/statefulset_client_ca_test.yaml @@ -0,0 +1,85 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +suite: statefulset client CA volume +templates: + - templates/gateway-config.yaml + - templates/statefulset.yaml +release: + name: openshell + namespace: my-namespace + +tests: + - it: mounts the server TLS secret ca.crt as client CA for built-in PKI + template: templates/statefulset.yaml + set: + pkiInitJob.enabled: true + certManager.enabled: false + asserts: + - equal: + path: spec.template.spec.volumes[3].name + value: tls-client-ca + - equal: + path: spec.template.spec.volumes[3].secret.secretName + value: openshell-server-tls + - equal: + path: spec.template.spec.volumes[3].secret.items[0].key + value: ca.crt + + - it: shares the cert-manager server TLS ca.crt when clientCaFromServerTlsSecret is true + template: templates/statefulset.yaml + set: + certManager.enabled: true + certManager.clientCaFromServerTlsSecret: true + asserts: + - equal: + path: spec.template.spec.volumes[3].name + value: tls-client-ca + - equal: + path: spec.template.spec.volumes[3].secret.secretName + value: openshell-server-tls + - equal: + path: spec.template.spec.volumes[3].secret.items[0].key + value: ca.crt + + # Regression: with cert-manager enabled and pkiInitJob left at its default + # `true`, the client CA condition must honor certManager precedence and NOT + # treat pkiInitJob.enabled as built-in TLS. With clientCaFromServerTlsSecret=false + # the gateway must mount the separate clientCaSecretName, not the server TLS + # cert secret. + - it: uses clientCaSecretName under cert-manager even when pkiInitJob stays enabled + template: templates/statefulset.yaml + set: + certManager.enabled: true + certManager.clientCaFromServerTlsSecret: false + pkiInitJob.enabled: true + asserts: + - equal: + path: spec.template.spec.volumes[3].name + value: tls-client-ca + - equal: + path: spec.template.spec.volumes[3].secret.secretName + value: openshell-server-client-ca + - notExists: + path: spec.template.spec.volumes[3].secret.items + + # When cert-manager owns TLS, does not share its CA, and no separate client CA + # secret is configured, there is no client CA to mount: the volume must not + # render rather than mounting an empty secret name. + - it: omits the client CA volume when cert-manager owns TLS and no client CA secret is set + template: templates/statefulset.yaml + set: + certManager.enabled: true + certManager.clientCaFromServerTlsSecret: false + pkiInitJob.enabled: true + server.tls.clientCaSecretName: "" + asserts: + - lengthEqual: + path: spec.template.spec.volumes + count: 3 + - notContains: + path: spec.template.spec.containers[0].volumeMounts + content: + name: tls-client-ca + mountPath: /etc/openshell-tls/client-ca + readOnly: true diff --git a/deploy/helm/openshell/values.yaml b/deploy/helm/openshell/values.yaml index 4d03b86d7..26b1a2486 100644 --- a/deploy/helm/openshell/values.yaml +++ b/deploy/helm/openshell/values.yaml @@ -196,9 +196,11 @@ server: clientCaSecretName: openshell-server-client-ca # -- K8s secret mounted into sandbox pods for mTLS to the server. clientTlsSecretName: openshell-client-tls - # Gateway-minted sandbox JWT signing keys. The pre-install certgen hook - # generates an Ed25519 keypair and writes it to a secret containing - # signing.pem (PKCS#8), public.pem (SPKI), and kid (plain text). + # Gateway-minted sandbox JWT signing keys. The certgen hook generates an + # Ed25519 keypair and writes it to a secret containing signing.pem (PKCS#8), + # public.pem (SPKI), and kid (plain text). The hook runs in full PKI mode when + # pkiInitJob.enabled=true unless certManager.enabled=true, which takes + # precedence and runs the hook in JWT-only mode. sandboxJwt: # -- Name of the Opaque Secret holding the signing key material. Empty # falls back to the chart fullname with "-jwt-keys" appended. @@ -270,7 +272,7 @@ networkPolicy: # -- Create a NetworkPolicy restricting SSH ingress on sandbox pods to the gateway. enabled: true -# PKI bootstrap via a pre-install/pre-upgrade hook Job. +# Built-in TLS PKI bootstrap via a pre-install/pre-upgrade hook Job. # Runs `openshell-gateway generate-certs` to create the server and client TLS # Secrets in-cluster. Key material is written directly to K8s Secrets and # never appears in Helm release history. Idempotent: existing secrets are @@ -285,7 +287,9 @@ networkPolicy: # that domain, for example `*.apps.example.com` enables # `--.apps.example.com`. pkiInitJob: - # -- Run a pre-install/pre-upgrade Job that creates gateway and client mTLS Secrets. + # -- Run a pre-install/pre-upgrade Job that creates gateway and client mTLS + # Secrets. When certManager.enabled=true, cert-manager owns TLS and this same + # hook runs in JWT-only mode even if pkiInitJob.enabled remains true. enabled: true # -- Extra DNS SANs to append to the server certificate. serverDnsNames: [] @@ -295,7 +299,9 @@ pkiInitJob: # cert-manager Certificate/Issuer resources (requires cert-manager CRDs in-cluster). # Uses namespaced Issuers only (no ClusterIssuer). Does not install cert-manager itself. certManager: - # -- Create cert-manager Issuer and Certificate resources instead of using the PKI bootstrap Job. + # -- Create cert-manager Issuer and Certificate resources. When enabled, + # cert-manager owns TLS and the chart runs a JWT-only certgen hook to create + # the sandbox JWT signing Secret that cert-manager does not manage. enabled: false # -- Secret created for the intermediate CA (Certificate with isCA: true). caSecretName: openshell-ca-tls diff --git a/docs/kubernetes/managing-certificates.mdx b/docs/kubernetes/managing-certificates.mdx index 42536cb9d..a445f77e8 100644 --- a/docs/kubernetes/managing-certificates.mdx +++ b/docs/kubernetes/managing-certificates.mdx @@ -18,7 +18,10 @@ The OpenShell gateway uses mTLS certificates for transport between the gateway a The rest of this page covers switching to cert-manager. The built-in mode requires no configuration. -cert-manager and `pkiInitJob` are mutually exclusive. The chart fails if both are enabled at the same time. +When `certManager.enabled=true`, cert-manager owns TLS certificate generation. +The chart still runs a JWT-only initialization hook because cert-manager does +not create the sandbox JWT signing Secret required by the gateway. This +cert-manager precedence applies even if `pkiInitJob.enabled` remains true. ## Install cert-manager @@ -50,11 +53,13 @@ helm upgrade --install openshell \ oci://ghcr.io/nvidia/openshell/helm-chart \ --version \ --namespace openshell \ - --set certManager.enabled=true \ - --set pkiInitJob.enabled=false + --set certManager.enabled=true ``` The chart creates a self-signed CA, issues server and client certificates from it, and cert-manager handles renewal before expiry. +The chart also runs a pre-install hook in JWT-only mode to create the gateway's +sandbox JWT signing Secret. That Secret is separate from the cert-manager TLS +certificate Secrets and is mounted at `/etc/openshell-jwt`. ## Next Steps diff --git a/docs/kubernetes/openshift.mdx b/docs/kubernetes/openshift.mdx index acad5ff79..e56fc37db 100644 --- a/docs/kubernetes/openshift.mdx +++ b/docs/kubernetes/openshift.mdx @@ -54,10 +54,14 @@ helm install openshell oci://ghcr.io/nvidia/openshell/helm-chart \ | Override | Reason | |---|---| -| `pkiInitJob.enabled=false` | The PKI init Job runs as a non-root user with a fixed UID, which the SCC admission rewrites or rejects. Disabling it skips the Job; TLS must also be disabled. | +| `pkiInitJob.enabled=false` | Skips the built-in TLS PKI Job. TLS must also be disabled unless you provide TLS Secrets another way. | | `server.disableTls=true` | The gateway has no certificates without `pkiInitJob`, so it must run plaintext. | | `podSecurityContext.fsGroup=null` / `securityContext.runAsUser=null` | Clear the chart's hardcoded UID and fsGroup so OpenShift's SCC admission can assign them. | +The gateway still needs the sandbox JWT signing Secret. When disabling +`pkiInitJob` without enabling cert-manager, pre-create that Secret before +installing the chart. + ## Wait for the gateway to be ready ```shell