Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,15 @@ kubectl -n openshell get secret \
openshell-jwt-keys
```

In cert-manager installs, `certManager.enabled=true` makes cert-manager own TLS
generation. The Helm chart should still render the `openshell-certgen`
pre-install/pre-upgrade hook in JWT-only mode to create `openshell-jwt-keys`,
even if `pkiInitJob.enabled` remains true.
If the gateway pod is pending with `MountVolume.SetUp failed for volume
"sandbox-jwt"` and `openshell-jwt-keys` is absent, inspect the rendered
`templates/certgen.yaml` output and the hook Job logs; cert-manager creates TLS
Secrets but does not create the sandbox JWT signing Secret.

If the gateway exits with `failed to read sandbox JWT signing key from
/etc/openshell-jwt/signing.pem`, verify that `openshell-jwt-keys` contains
`signing.pem`, `public.pem`, and `kid`, and that the StatefulSet mounts the
Expand Down
47 changes: 27 additions & 20 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -363,32 +363,39 @@ requested for that relay.

## PKI Bootstrap

`openshell-gateway generate-certs` is the one place mTLS materials are
created. Both deployment paths use it:
`openshell-gateway generate-certs` is the one place local mTLS materials and
sandbox JWT signing material are created. Deployment paths use it as follows:

| Output mode | Selector | Layout |
|---|---|---|
| Kubernetes Secrets | (default) `--namespace`, `--server-secret-name`, `--client-secret-name` | Two `kubernetes.io/tls` Secrets with `tls.crt` / `tls.key` / `ca.crt`. |
| Filesystem | `--output-dir <DIR>` | `<dir>/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}}`. Also copies client materials to `$XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/` for CLI auto-discovery. |
| Kubernetes Secrets | (default) `--namespace`, `--server-secret-name`, `--client-secret-name`, `--jwt-secret-name` | Two `kubernetes.io/tls` Secrets with `tls.crt` / `tls.key` / `ca.crt` plus one Opaque sandbox JWT Secret with `signing.pem` / `public.pem` / `kid`. |
| Kubernetes JWT-only Secret | `--namespace`, `--jwt-only`, `--jwt-secret-name` | One Opaque sandbox JWT Secret with `signing.pem` / `public.pem` / `kid`. |
| Filesystem | `--output-dir <DIR>` | `<dir>/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}, jwt/{signing.pem,public.pem,kid}}`. Also copies client materials to `$XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/` for CLI auto-discovery. |

On Kubernetes, the Helm chart runs the command via a pre-install/pre-upgrade
hook Job using the gateway image itself -- no separate cert-generation image,
no extra mirror burden in air-gapped environments. On package-managed local
gateways, the same command runs from the systemd unit's `ExecStartPre` to
bootstrap PKI into the configured local TLS directory on first start. The
Linux package unit defaults that directory to `~/.local/state/openshell/tls`
through `OPENSHELL_LOCAL_TLS_DIR` so certificate generation and runtime
auto-detection use the same path across systemd versions.

Both modes share the same idempotency contract: all targets present -> skip;
partial state -> fail with a recovery hint; nothing present -> generate and
write. This guards mTLS continuity across restarts and upgrades while still
recovering cleanly if an operator deletes everything and starts over.

Operators who manage PKI externally (cert-manager, an enterprise CA, or
pre-created Secrets) disable the Helm hook via `pkiInitJob.enabled=false`.
The chart also ships a `certManager.*` path that produces equivalent Secrets
through cert-manager `Issuer`/`Certificate` resources.
no extra mirror burden in air-gapped environments. In the default built-in PKI
path the hook creates TLS and sandbox JWT Secrets. When cert-manager is enabled,
cert-manager owns TLS Secrets and the hook runs with `--jwt-only` so the
required sandbox JWT Secret still exists before the gateway StatefulSet mounts
it, even if `pkiInitJob.enabled` remains true. On package-managed local
gateways, the same command runs from the systemd
unit's `ExecStartPre` to bootstrap PKI into the configured local TLS directory
on first start. The Linux package unit defaults that directory to
`~/.local/state/openshell/tls` through `OPENSHELL_LOCAL_TLS_DIR` so certificate
generation and runtime auto-detection use the same path across systemd
versions.

The bootstrap paths share the same idempotency contract: all requested targets
present -> skip; partial requested state -> fail with a recovery hint; nothing
requested present -> generate and write. This guards continuity across restarts
and upgrades while still recovering cleanly if an operator deletes everything
and starts over.

Operators who manage TLS PKI with cert-manager enable `certManager.enabled`;
cert-manager takes precedence over built-in TLS generation and the chart still
renders the JWT-only hook. Operators who pre-create all TLS and JWT Secrets can
disable both `pkiInitJob.enabled` and `certManager.enabled`.

## Configuration

Expand Down
124 changes: 91 additions & 33 deletions crates/openshell-server/src/certgen.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,12 @@
//! Two output modes, dispatched by the presence of `--output-dir`:
//!
//! - **Kubernetes mode** (default): create two `kubernetes.io/tls` Secrets
//! in the supplied namespace. Used by the Helm pre-install hook. Requires
//! `--namespace`, `--server-secret-name`, `--client-secret-name`.
//! and one sandbox-JWT signing Secret in the supplied namespace. Used by
//! the Helm pre-install hook. Requires `--namespace`,
//! `--server-secret-name`, `--client-secret-name`, and `--jwt-secret-name`.
//! - **Kubernetes JWT-only mode** (`--jwt-only`): create only the
//! sandbox-JWT signing Secret. Used when another controller, such as
//! cert-manager, owns the TLS Secrets.
//! - **Local mode** (`--output-dir <DIR>`): write PEMs to the local package
//! filesystem layout. Used by systemd units' `ExecStartPre`. Also copies
//! client materials to
Expand Down Expand Up @@ -47,11 +51,11 @@ pub struct CertgenArgs {
namespace: Option<String>,

/// Name of the server TLS Secret (`kubernetes.io/tls`) to create.
#[arg(long, required_unless_present = "output_dir")]
#[arg(long, required_unless_present_any = ["output_dir", "jwt_only"])]
server_secret_name: Option<String>,

/// Name of the client TLS Secret (`kubernetes.io/tls`) to create.
#[arg(long, required_unless_present = "output_dir")]
#[arg(long, required_unless_present_any = ["output_dir", "jwt_only"])]
client_secret_name: Option<String>,

/// Name of the sandbox-JWT signing-key Secret (`Opaque`) to create.
Expand All @@ -60,6 +64,11 @@ pub struct CertgenArgs {
#[arg(long, required_unless_present = "output_dir")]
jwt_secret_name: Option<String>,

/// Create only the sandbox-JWT signing-key Secret in Kubernetes mode.
/// This is used when another controller owns TLS Secret provisioning.
#[arg(long, conflicts_with = "output_dir")]
jwt_only: bool,

/// Extra Subject Alternative Name for the server certificate. Repeatable.
/// Auto-detected as an IP address or DNS name.
#[arg(long = "server-san", value_name = "SAN")]
Expand Down Expand Up @@ -116,6 +125,51 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
.namespace
.as_deref()
.ok_or_else(|| miette::miette!("--namespace is required (or set POD_NAMESPACE)"))?;

let client = Client::try_default()
.await
.into_diagnostic()
.wrap_err("failed to construct in-cluster Kubernetes client")?;
let api: Api<Secret> = Api::namespaced(client, namespace);

if args.jwt_only {
let jwt_name = args
.jwt_secret_name
.as_deref()
.ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?;
let jwt_exists = api
.get_opt(jwt_name)
.await
.into_diagnostic()
.wrap_err_with(|| format!("failed to read secret {jwt_name}"))?
.is_some();
if jwt_exists {
info!(
namespace = %namespace,
jwt = %jwt_name,
"JWT signing secret already exists, skipping."
);
return Ok(());
}

let jwt_secret = jwt_signing_secret(
jwt_name,
&bundle.jwt_signing_key_pem,
&bundle.jwt_public_key_pem,
&bundle.jwt_key_id,
);
api.create(&PostParams::default(), &jwt_secret)
.await
.into_diagnostic()
.wrap_err_with(|| format!("failed to create secret {jwt_name}"))?;
info!(
namespace = %namespace,
jwt = %jwt_name,
"JWT signing secret created."
);
return Ok(());
}

let server_name = args
.server_secret_name
.as_deref()
Expand All @@ -124,17 +178,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
.client_secret_name
.as_deref()
.ok_or_else(|| miette::miette!("--client-secret-name is required"))?;
let jwt_name = args
.jwt_secret_name
.as_deref()
.ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?;

let client = Client::try_default()
.await
.into_diagnostic()
.wrap_err("failed to construct in-cluster Kubernetes client")?;
let api: Api<Secret> = Api::namespaced(client, namespace);

let server_exists = api
.get_opt(server_name)
.await
Expand All @@ -147,6 +190,11 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
.into_diagnostic()
.wrap_err_with(|| format!("failed to read secret {client_name}"))?
.is_some();

let jwt_name = args
.jwt_secret_name
.as_deref()
.ok_or_else(|| miette::miette!("--jwt-secret-name is required"))?;
let jwt_exists = api
.get_opt(jwt_name)
.await
Expand Down Expand Up @@ -193,6 +241,34 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
K8sAction::CreateAll => {}
}

create_tls_secrets(&api, server_name, client_name, bundle).await?;
let jwt_secret = jwt_signing_secret(
jwt_name,
&bundle.jwt_signing_key_pem,
&bundle.jwt_public_key_pem,
&bundle.jwt_key_id,
);
api.create(&PostParams::default(), &jwt_secret)
.await
.into_diagnostic()
.wrap_err_with(|| format!("failed to create secret {jwt_name}"))?;

info!(
namespace = %namespace,
server = %server_name,
client = %client_name,
jwt = %jwt_name,
"PKI secrets created."
);
Ok(())
}

async fn create_tls_secrets(
api: &Api<Secret>,
server_name: &str,
client_name: &str,
bundle: &PkiBundle,
) -> Result<()> {
let server_secret = tls_secret(
server_name,
&bundle.server_cert_pem,
Expand All @@ -205,12 +281,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
&bundle.client_key_pem,
&bundle.ca_cert_pem,
);
let jwt_secret = jwt_signing_secret(
jwt_name,
&bundle.jwt_signing_key_pem,
&bundle.jwt_public_key_pem,
&bundle.jwt_key_id,
);

api.create(&PostParams::default(), &server_secret)
.await
Expand All @@ -220,18 +290,6 @@ async fn run_kubernetes(args: &CertgenArgs, bundle: &PkiBundle) -> Result<()> {
.await
.into_diagnostic()
.wrap_err_with(|| format!("failed to create secret {client_name}"))?;
api.create(&PostParams::default(), &jwt_secret)
.await
.into_diagnostic()
.wrap_err_with(|| format!("failed to create secret {jwt_name}"))?;

info!(
namespace = %namespace,
server = %server_name,
client = %client_name,
jwt = %jwt_name,
"PKI secrets created."
);
Ok(())
}

Expand Down
25 changes: 25 additions & 0 deletions crates/openshell-server/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -979,6 +979,31 @@ mod tests {
));
}

#[test]
fn generate_certs_jwt_only_parses_without_tls_secret_names() {
let _lock = ENV_LOCK
.lock()
.unwrap_or_else(std::sync::PoisonError::into_inner);
let _g1 = EnvVarGuard::remove("OPENSHELL_DB_URL");
let _g2 = EnvVarGuard::remove("POD_NAMESPACE");

let cli = Cli::try_parse_from([
"openshell-gateway",
"generate-certs",
"--namespace",
"openshell",
"--jwt-only",
"--jwt-secret-name",
"openshell-jwt-keys",
])
.expect("--jwt-only should make TLS secret-name flags optional");

assert!(matches!(
cli.command,
Some(super::Commands::GenerateCerts(_))
));
}

#[test]
fn bare_invocation_with_no_db_url_parses_for_runtime_defaults() {
// db_url is Option<String> at the clap level so subcommand parsing
Expand Down
27 changes: 12 additions & 15 deletions deploy/helm/openshell/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,22 +109,19 @@ Append these flags to any of the PostgreSQL commands above for OpenShift:
--set securityContext.runAsUser=null
```

## PKI bootstrap
## Secret bootstrap

By default, a pre-install/pre-upgrade hook Job runs `openshell-gateway generate-certs`
to create the gateway's server and client mTLS Secrets. The Job uses the gateway image
itself, so air-gapped environments only need to mirror that one image (no separate
openssl/alpine sidecar).
to create the gateway's server/client mTLS Secrets and sandbox JWT signing Secret.
The Job uses the gateway image itself, so air-gapped environments only need to
mirror that one image (no separate openssl/alpine sidecar).

The Job is idempotent:

- Both target Secrets exist: log and exit 0.
- Exactly one exists: fail with `kubectl delete secret -n <ns> <server> <client>` recovery hint.
- Neither exists: generate a CA, server cert, and client cert; POST both `kubernetes.io/tls` Secrets (`tls.crt`, `tls.key`, `ca.crt`).

Disable with `--set pkiInitJob.enabled=false` when bringing your own PKI (cert-manager,
external CA, or pre-created Secrets). See `certManager.*` in `values.yaml` for the
cert-manager alternative.
When `certManager.enabled=true`, cert-manager owns the TLS Secrets and the chart
runs the same hook in JWT-only mode because cert-manager does not create the
sandbox JWT signing Secret. This precedence applies even if
`pkiInitJob.enabled` remains true. Set `pkiInitJob.enabled=false` only when an
external non-cert-manager TLS source manages TLS and you pre-create the sandbox
JWT signing Secret.

## Values

Expand All @@ -135,7 +132,7 @@ cert-manager alternative.
| certManager.certificateDuration | string | `"8760h"` | Duration for cert-manager-issued certificates. |
| certManager.certificateRenewBefore | string | `"720h"` | Renewal window for cert-manager-issued certificates. |
| certManager.clientCaFromServerTlsSecret | bool | `true` | Mount gateway client CA from the server TLS secret's ca.crt (populated by cert-manager for certs issued by a CA Issuer). Avoids a separate openshell-server-client-ca Secret. |
| certManager.enabled | bool | `false` | Create cert-manager Issuer and Certificate resources instead of using the PKI bootstrap Job. |
| certManager.enabled | bool | `false` | Create cert-manager Issuer and Certificate resources. When enabled, cert-manager owns TLS and the chart runs a JWT-only certgen hook to create the sandbox JWT signing Secret that cert-manager does not manage. |
| certManager.serverDnsNames | list | `["openshell","openshell.openshell.svc","openshell.openshell.svc.cluster.local","localhost","openshell.localhost","*.openshell.localhost","host.docker.internal"]` | DNS SANs on the cert-manager-issued server certificate. |
| certManager.serverIpAddresses | list | `["127.0.0.1"]` | IP SANs on the cert-manager-issued server certificate. |
| fullnameOverride | string | `""` | Override the full generated resource name. |
Expand All @@ -155,7 +152,7 @@ cert-manager alternative.
| nameOverride | string | `"openshell"` | Override the chart name used in generated resource names. |
| networkPolicy.enabled | bool | `true` | Create a NetworkPolicy restricting SSH ingress on sandbox pods to the gateway. |
| nodeSelector | object | `{}` | Node selector for the gateway pod. |
| pkiInitJob.enabled | bool | `true` | Run a pre-install/pre-upgrade Job that creates gateway and client mTLS Secrets. |
| pkiInitJob.enabled | bool | `true` | Run a pre-install/pre-upgrade Job that creates gateway and client mTLS Secrets. When certManager.enabled=true, cert-manager owns TLS and this same hook runs in JWT-only mode even if pkiInitJob.enabled remains true. |
| pkiInitJob.serverDnsNames | list | `[]` | Extra DNS SANs to append to the server certificate. |
| pkiInitJob.serverIpAddresses | list | `[]` | Extra IP SANs to append to the server certificate. |
| podAnnotations | object | `{}` | Extra annotations to add to the gateway pod. |
Expand Down
25 changes: 11 additions & 14 deletions deploy/helm/openshell/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -109,22 +109,19 @@ Append these flags to any of the PostgreSQL commands above for OpenShift:
--set securityContext.runAsUser=null
```

## PKI bootstrap
## Secret bootstrap

By default, a pre-install/pre-upgrade hook Job runs `openshell-gateway generate-certs`
to create the gateway's server and client mTLS Secrets. The Job uses the gateway image
itself, so air-gapped environments only need to mirror that one image (no separate
openssl/alpine sidecar).

The Job is idempotent:

- Both target Secrets exist: log and exit 0.
- Exactly one exists: fail with `kubectl delete secret -n <ns> <server> <client>` recovery hint.
- Neither exists: generate a CA, server cert, and client cert; POST both `kubernetes.io/tls` Secrets (`tls.crt`, `tls.key`, `ca.crt`).

Disable with `--set pkiInitJob.enabled=false` when bringing your own PKI (cert-manager,
external CA, or pre-created Secrets). See `certManager.*` in `values.yaml` for the
cert-manager alternative.
to create the gateway's server/client mTLS Secrets and sandbox JWT signing Secret.
The Job uses the gateway image itself, so air-gapped environments only need to
mirror that one image (no separate openssl/alpine sidecar).

When `certManager.enabled=true`, cert-manager owns the TLS Secrets and the chart
runs the same hook in JWT-only mode because cert-manager does not create the
sandbox JWT signing Secret. This precedence applies even if
`pkiInitJob.enabled` remains true. Set `pkiInitJob.enabled=false` only when an
external non-cert-manager TLS source manages TLS and you pre-create the sandbox
JWT signing Secret.

{{ template "chart.valuesSection" . }}
{{ template "helm-docs.versionFooter" . }}
3 changes: 0 additions & 3 deletions deploy/helm/openshell/ci/values-cert-manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,5 @@
server:
disableTls: false

pkiInitJob:
enabled: false

certManager:
enabled: true
Loading
Loading