-
Notifications
You must be signed in to change notification settings - Fork 146
docs(mender-gateway): clarification regarding intermediate ca #2744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
oldgiova
wants to merge
2
commits into
mendersoftware:master
Choose a base branch
from
oldgiova:mender-hub-question
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
178 changes: 178 additions & 0 deletions
178
.../04.Mender-Gateway/10.Mutual-TLS-authentication/04.Certificate-rotation/docs.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,178 @@ | ||
| --- | ||
| title: Certificate rotation | ||
| taxonomy: | ||
| category: docs | ||
| --- | ||
|
|
||
| When using a flat PKI (`Root CA → Device`), rotating client certificates requires updating the gateway and re-provisioning all devices at once. By using a two-level PKI where each device presents the full certificate chain (device cert + intermediate CA cert), you can rotate the intermediate CA without touching the gateway at all — old and new devices coexist until the rollout is complete. | ||
|
|
||
| ## How it works | ||
|
|
||
| The gateway is configured to trust only the Root CA. Devices send both their device certificate and the intermediate CA that signed it during the TLS handshake. Since the Root CA stays constant, the gateway configuration never changes during a rotation. | ||
|
|
||
| ## Setting up the PKI | ||
|
|
||
| ### Generate the Root CA | ||
|
|
||
| The Root CA is long-lived and its certificate is the only thing the gateway needs. | ||
|
|
||
| ```bash | ||
| openssl ecparam -genkey -name P-256 -noout -out root-ca.key | ||
|
|
||
| cat > root-ca.conf <<EOF | ||
| [req] | ||
| distinguished_name = req_distinguished_name | ||
| prompt = no | ||
| x509_extensions = v3_ca | ||
|
|
||
| [req_distinguished_name] | ||
| commonName=Root CA | ||
| organizationName=My Organization | ||
|
|
||
| [v3_ca] | ||
| subjectKeyIdentifier = hash | ||
| authorityKeyIdentifier = keyid:always,issuer | ||
| basicConstraints = critical, CA:true, pathlen:1 | ||
| keyUsage = critical, keyCertSign, cRLSign | ||
| EOF | ||
|
|
||
| openssl req -new -x509 -key root-ca.key -out root-ca.crt -config root-ca.conf -days $((365*10)) | ||
| ``` | ||
|
|
||
| ### Generate the Intermediate CA | ||
|
|
||
| The intermediate CA signs device certificates and is the unit of rotation. | ||
|
|
||
| ```bash | ||
| openssl ecparam -genkey -name P-256 -noout -out intermediate-ca.key | ||
|
|
||
| cat > intermediate-ca.conf <<EOF | ||
| [req] | ||
| distinguished_name = req_distinguished_name | ||
| prompt = no | ||
|
|
||
| [req_distinguished_name] | ||
| commonName=Intermediate CA v1 | ||
| organizationName=My Organization | ||
| EOF | ||
|
|
||
| cat > intermediate-ca-ext.conf <<EOF | ||
| subjectKeyIdentifier = hash | ||
| authorityKeyIdentifier = keyid:always,issuer | ||
| basicConstraints = critical, CA:true, pathlen:0 | ||
| keyUsage = critical, keyCertSign, cRLSign | ||
| EOF | ||
|
|
||
| openssl req -new -key intermediate-ca.key -out intermediate-ca.req -config intermediate-ca.conf | ||
| openssl x509 -req -CA root-ca.crt -CAkey root-ca.key -CAcreateserial \ | ||
| -in intermediate-ca.req -out intermediate-ca.crt \ | ||
| -extfile intermediate-ca-ext.conf -days $((365*5)) | ||
| ``` | ||
|
|
||
| ### Generate a device certificate and bundle the chain | ||
|
|
||
| Each device gets a certificate signed by the intermediate CA. The device presents both certificates (its own plus the intermediate CA) as a chain during the TLS handshake. | ||
|
|
||
| ```bash | ||
| openssl ecparam -genkey -name P-256 -noout -out device.key | ||
|
|
||
| cat > device.conf <<EOF | ||
| [req] | ||
| distinguished_name = req_distinguished_name | ||
| prompt = no | ||
|
|
||
| [req_distinguished_name] | ||
| commonName=my-device-hostname.com | ||
| organizationName=My Organization | ||
| EOF | ||
|
|
||
| openssl req -new -key device.key -out device.req -config device.conf | ||
| openssl x509 -req -CA intermediate-ca.crt -CAkey intermediate-ca.key -CAcreateserial \ | ||
| -in device.req -out device.crt -days $((365*2)) | ||
|
|
||
| # Bundle: device cert first, then the intermediate CA | ||
| cat device.crt intermediate-ca.crt > device-chain.pem | ||
| ``` | ||
|
|
||
| Provision `device-chain.pem` and `device.key` onto each device. | ||
|
|
||
| ## Configuring the gateway | ||
|
|
||
| Point the gateway at the Root CA only — `MTLS_CA_CERTIFICATE` is the trust anchor for client certificates and `root-ca.crt` is the only thing it needs to verify any device chain. No intermediate CA certificate is mounted on the gateway side, which is what makes rotation possible without reconfiguration. | ||
|
|
||
| This example assumes you have already populated the [environment variables](../01.Keys-and-certificates/docs.md#environment-variables) and generated `server.crt`/`server.key` for the gateway's server-side TLS (see the [Keys and certificates](../01.Keys-and-certificates/docs.md) page; sign them with `root-ca.key` instead of `ca-private.key`). | ||
|
|
||
| ```bash | ||
| sudo chown 65534 $(pwd)/server.crt $(pwd)/server.key $(pwd)/root-ca.crt | ||
| sudo chmod 0600 $(pwd)/server.key | ||
|
|
||
| docker run \ | ||
| -p 8443:8443 \ | ||
| -p 8080:8080 \ | ||
| --name mender-gateway \ | ||
| -e HTTPS_ENABLED="true" \ | ||
| -e HTTPS_LISTEN=":8443" \ | ||
| -e HTTP_ENABLED="true" \ | ||
| -e HTTP_LISTEN=":8080" \ | ||
| -e HTTPS_SERVER_CERTIFICATE="/etc/mender/certs/server.crt" \ | ||
| -e HTTPS_SERVER_KEY="/etc/mender/certs/server.key" \ | ||
| -e MTLS_CA_CERTIFICATE="/etc/ssl/certs/root-ca.crt" \ | ||
| -e MTLS_ENABLED="true" \ | ||
| -e MTLS_MENDER_PASSWORD="$MENDER_PASSWORD" \ | ||
| -e MTLS_MENDER_USERNAME="$MENDER_USERNAME" \ | ||
| -e UPSTREAM_SERVER_URL="$UPSTREAM_SERVER_URL" \ | ||
| -v $(pwd)/server.crt:/etc/mender/certs/server.crt \ | ||
| -v $(pwd)/server.key:/etc/mender/certs/server.key \ | ||
| -v $(pwd)/root-ca.crt:/etc/ssl/certs/root-ca.crt \ | ||
| $MENDER_GATEWAY_IMAGE --log-level debug | ||
| ``` | ||
|
|
||
| ## Rotating the intermediate CA | ||
|
|
||
| To rotate, generate a new intermediate CA signed by the same Root CA. New devices get certificates signed by the new intermediate CA and present the new chain. Devices already in the field continue working — no gateway change required. | ||
| You would rotate the intermediate CA if the old one is about to expire or if you have suspicions it was compromised. For the expiration case the old certificate will just stop being valid at some point in time, but for the compromised case you need to take explicit action; see [Operating mTLS securely](../05.Operating-mTLS-securely/docs.md). | ||
|
|
||
| ### Generate the new Intermediate CA | ||
|
|
||
| ```bash | ||
| openssl ecparam -genkey -name P-256 -noout -out intermediate-ca-v2.key | ||
|
|
||
| cat > intermediate-ca-v2.conf <<EOF | ||
| [req] | ||
| distinguished_name = req_distinguished_name | ||
| prompt = no | ||
|
|
||
| [req_distinguished_name] | ||
| commonName=Intermediate CA v2 | ||
| organizationName=My Organization | ||
| EOF | ||
|
|
||
| openssl req -new -key intermediate-ca-v2.key -out intermediate-ca-v2.req -config intermediate-ca-v2.conf | ||
| openssl x509 -req -CA root-ca.crt -CAkey root-ca.key -CAcreateserial \ | ||
| -in intermediate-ca-v2.req -out intermediate-ca-v2.crt \ | ||
| -extfile intermediate-ca-ext.conf -days $((365*5)) | ||
| ``` | ||
|
|
||
| ### Provision new devices with the new chain | ||
|
|
||
| ```bash | ||
| openssl ecparam -genkey -name P-256 -noout -out device-v2.key | ||
| openssl req -new -key device-v2.key -out device-v2.req -config device.conf | ||
| openssl x509 -req -CA intermediate-ca-v2.crt -CAkey intermediate-ca-v2.key -CAcreateserial \ | ||
| -in device-v2.req -out device-v2.crt -days $((365*2)) | ||
|
|
||
| cat device-v2.crt intermediate-ca-v2.crt > device-v2-chain.pem | ||
| ``` | ||
|
|
||
| The gateway accepts both `device-chain.pem` (v1) and `device-v2-chain.pem` (v2) because both intermediate CAs chain up to the trusted Root CA. Update the v1 certificates on the devices at your own pace. | ||
|
|
||
| ## Verifying the chains | ||
|
|
||
| ```bash | ||
| # Verify v1 chain | ||
| openssl verify -CAfile root-ca.crt -untrusted intermediate-ca.crt device.crt | ||
|
|
||
| # Verify v2 chain | ||
| openssl verify -CAfile root-ca.crt -untrusted intermediate-ca-v2.crt device-v2.crt | ||
| ``` | ||
|
|
||
106 changes: 106 additions & 0 deletions
106
....Mender-Gateway/10.Mutual-TLS-authentication/05.Operating-mTLS-securely/docs.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| --- | ||
| title: Operating mTLS securely | ||
| taxonomy: | ||
| category: docs | ||
| --- | ||
|
|
||
| This chapter covers a hardening pattern that shortens recovery from a Root CA compromise, and the incident-response procedure for when a private key in the PKI has actually been compromised. For scheduled, zero-downtime turnover see [Certificate rotation](../04.Certificate-rotation/docs.md). That procedure does not solve a compromise. | ||
|
|
||
| ## Pre-staging a backup mTLS PKI | ||
|
|
||
| Recovery from a compromised Root CA is operationally expensive: every device in the field needs new crypto material, generated and delivered after the incident. You can shorten that recovery dramatically by pre-staging a backup PKI at manufacturing time and keeping its Root CA private key offline. | ||
|
|
||
| The pattern: | ||
|
|
||
| 1. Generate two independent Root CAs at provisioning time, a *primary* and a *backup*. Each has its own intermediate. | ||
| 2. Provision both chains and both private keys on every device: `device-chain-primary.pem` and `device-primary.key` are the active pair, `device-chain-backup.pem` and `device-backup.key` sit on the data partition unused. | ||
| 3. Keep the backup Root CA's private key offline and isolated from the primary, in stronger storage and under separate access controls. Treat it as break-glass material. | ||
| 4. Configure the gateway to trust only the primary Root CA via `MTLS_CA_CERTIFICATE`. | ||
|
|
||
| In the event of a primary Root CA compromise, recovery becomes a configuration switch instead of a re-provisioning project: see [If you pre-staged a backup PKI](#if-you-pre-staged-a-backup-pki) below. | ||
|
|
||
| Operational notes: | ||
|
|
||
| - The two Root CAs must come from genuinely independent ceremonies. If both are generated on the same host, a host compromise takes both. | ||
| - After switching from primary to backup, you have used your second life. Generate and roll out a new backup chain so the fleet is two-deep again. | ||
|
|
||
| ## Compromised intermediate CA | ||
|
|
||
| The first question to answer is: does the attacker have the intermediate's private key, or only specific issued device certificates? | ||
|
|
||
| If the attacker holds the private key, blacklisting does not contain the threat. They can issue new device certs with serial numbers you have never seen, which will never be in your blacklist. Treat this as a compromised Root CA; see below. | ||
|
|
||
| If only specific device certs are compromised, blacklist those serials with `MTLS_BLACKLIST_PATH` (see [Configuration file](../../99.Configuration-file/docs.md)). The file is a newline-separated list of hex serial numbers. This works only if you have kept a record of issued serials so you know which to add. | ||
|
|
||
| ## Compromised Root CA | ||
|
|
||
| The Root CA is the trust anchor, so once it is suspect the only remedy is to switch the fleet to a new Root CA. There is no zero-downtime path. | ||
|
|
||
| Decide first: is the attacker holding the Root CA's private key, or do you suspect compromise of trust on other grounds (mis-issuance you cannot reproduce, suspicious activity in the signing infrastructure, lost confidence in the offline storage)? The two cases want different trade-offs. | ||
|
|
||
| ### Attacker holds the private key | ||
|
|
||
| Take the old gateway offline as soon as you detect the compromise. While it is up, anything signed by the old root is trusted, including a server certificate the attacker can issue for the gateway's own hostname; a device that still trusts the old root will accept the impersonation without warning. Devices in the field cannot connect until they receive the new crypto material. That is the cost of containment, and it is the correct trade-off in a real compromise. | ||
|
|
||
| The [migration via parallel gateway](#migration-via-parallel-gateway) below is not directly applicable in this scenario: it assumes the rollout flows through a still-running gateway, which contradicts containment. Use the [pre-staged backup PKI](#if-you-pre-staged-a-backup-pki) recovery if you have one; otherwise use the [bypass strategy](#bypass-via-direct-mender-server-connection). | ||
|
|
||
| ### Pre-emptive rotation without key exposure | ||
|
|
||
| If you do not believe the private key has been exfiltrated (for example, you are rotating because of unrelated suspicious activity), you can run two gateways side by side and stage the cutover. Devices keep authenticating against the old gateway until they receive the new crypto material. | ||
|
|
||
| ### If you pre-staged a backup PKI | ||
|
|
||
| If you followed the [pre-staging hardening pattern](#pre-staging-a-backup-mtls-pki), recovery is mostly a configuration switch. Devices already have the backup chain on disk, so the deployment only needs to tell them which file to use. | ||
|
|
||
| 1. **Update `MTLS_CA_CERTIFICATE` on the gateway** to trust both the primary and the backup Root CA (concatenate the two root certificates in the file). For pre-emptive rotation this is fine; for a real compromise the operator must accept a window where the gateway still trusts the compromised root. If that window is unacceptable, see the hard-cutover note below. | ||
| 2. **Push a Mender Deployment** (typically an [`apply-device-config`](../../../../11.Add-ons/10.Configure/01.Device-integration/docs.md) script) that updates `mender.conf` on each device to use `device-chain-backup.pem` and `device-backup.key` for the active client chain. | ||
| 3. **Remove the primary Root CA** from `MTLS_CA_CERTIFICATE` on the gateway, once all devices have switched. | ||
| 4. **Generate a new backup PKI** (Root CA + intermediate + per-device chains) and roll it out via a normal Mender Deployment so the fleet is two-deep again. | ||
|
|
||
| The per-device generation of new client material is skipped: the bottleneck is the deployment rollout itself. | ||
|
|
||
| For a hard cutover, take the gateway offline immediately and deliver step 2 via the [bypass strategy](#bypass-via-direct-mender-server-connection). The pre-staged backup still saves the per-device generation step. | ||
|
|
||
| ### Migration via parallel gateway | ||
|
|
||
| This procedure assumes the old gateway stays reachable while the rollout proceeds, so it fits pre-emptive rotation. For a real key-exposure compromise, use the [pre-staged backup](#if-you-pre-staged-a-backup-pki) or [bypass](#bypass-via-direct-mender-server-connection) recovery paths instead. | ||
|
|
||
| 1. **Stand up the new gateway** on a separate hostname or IP, with `MTLS_CA_CERTIFICATE` pointing at the new Root CA and a fresh `HTTPS_SERVER_CERTIFICATE` chain. Verify the listener with the procedure in [Verifying the chains](../04.Certificate-rotation/docs.md#verifying-the-chains). | ||
|
|
||
| 2. **Generate new client crypto material** for each device, signed by the new Root CA: device cert, intermediate cert, device key. Bundle the chain as `device-chain.pem`. | ||
|
|
||
| 3. **Build a Mender Artifact** (or an `apply-device-config` script using the [Configure add-on](../../../../11.Add-ons/10.Configure/01.Device-integration/docs.md)) that on each device: | ||
| - installs the new `device-chain.pem` and `device.key`, | ||
| - updates `mender.conf` to set `Servers` to the new gateway URL, | ||
| - sets `ServerCertificate` to the new gateway's CA chain. Do **not** include the old root in this file if the private key is suspected exfiltrated; keeping it would let the attacker impersonate the gateway. If the rotation is pre-emptive and the old key is not exposed, you can concatenate old + new so devices verify both gateways during the rollout. | ||
|
|
||
| See [client configuration options](../../../../03.Client-installation/07.Configuration/50.Configuration-options/docs.md#servers) for the relevant fields. | ||
|
|
||
| 4. **Test on a canary device first.** Confirm the canary appears on the new gateway and that the deployment reports success before rolling out further. A bad `mender.conf` (invalid JSON) stops `mender-updated` from starting and the device only rolls back on the next reboot. | ||
|
|
||
| 5. **Roll out to the fleet in waves.** Watch the new gateway logs and authentication counts as devices switch over. | ||
|
|
||
| 6. **Decommission the old gateway** once the new gateway shows all expected devices and the old one is idle. | ||
|
|
||
| ### Bypass via direct Mender Server connection | ||
|
|
||
| Where the Mender Server is reachable from the devices' network, you can avoid the parallel-gateway approach entirely: | ||
|
|
||
| 1. Push a Mender Artifact that updates `mender.conf` to set `ServerURL` to the Mender Server directly and removes the mTLS settings. The device falls back to the standard Mender authentication flow, using its auto-generated `mender-agent.pem` keypair, which is independent of the gateway PKI. | ||
| 2. Take the compromised gateway offline. | ||
| 3. Stand up a new Mender Gateway with a new Root CA at your own pace. | ||
| 4. Push a second deployment to re-introduce the new gateway URL and new client crypto material. That step is now a normal scheduled rotation, not incident response. | ||
|
|
||
| The deployment in step 1 still passes through the compromised gateway, so the attacker can read its contents (TLS confidentiality is lost) and can block it from reaching devices (denial of service). They cannot tamper with the payload: Mender Artifacts are signed with a key separate from the gateway PKI (see `ArtifactVerifyKeys` in the [client configuration options](../../../../03.Client-installation/07.Configuration/50.Configuration-options/docs.md)). Watch deployment success rates closely. | ||
|
|
||
| For a real key-exposure compromise where containment is the priority, the operator can swap steps 1 and 2: take the gateway offline first to stop the exposure, then deliver step 1 to whatever devices are still online. Devices that were offline at that moment will need manual re-provisioning when they reappear. | ||
|
|
||
| This strategy is not available if the Mender Server is behind the gateway on a private network and the devices cannot reach it directly. It also requires that you can approve devices on the Mender Server when they appear there for the first time without gateway pre-auth: set auto-accept policies first if your fleet is large. | ||
|
|
||
| ### Cleaning up the compromised key | ||
|
|
||
| The compromised material is the CA private key, not the per-device keys. Identify and destroy every copy. | ||
|
|
||
| ### Things that go wrong | ||
|
|
||
| Devices that are offline during the rollout stay on the old gateway until they come back. If the rotation is pre-emptive you can wait for them. For a real compromise where the old gateway must come down immediately, offline devices will require manual re-provisioning when they reappear. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good call for a dedicated section like: "how to deal with a compromised certiifcate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the follow up for this: 5045e35