Skip to content

Release-signing toolchain: HSM-backed OpenPGP workflow#240

Open
quarckster wants to merge 17 commits into
openssl:masterfrom
quarckster:signing-script
Open

Release-signing toolchain: HSM-backed OpenPGP workflow#240
quarckster wants to merge 17 commits into
openssl:masterfrom
quarckster:signing-script

Conversation

@quarckster
Copy link
Copy Markdown
Member

Summary

Wire the OpenSSL release-signing toolchain to a PKCS#11 HSM through sq-pkcs11, with a tmux-based ceremony runner for K/N OCS-quorum operations and pre-flight checks at every HSM interaction.

Content

openssl-pgp — release-signing policy wrapper over sq-pkcs11

The policy: RSA-4096 primary (Certify, OCS-protected, 5y) + RSA-4096 signing subkey (Sign, module-protected, 1y), logkeyusage=yes for Security World audit-log coverage, creation times derived from nShield gentime.

openssl-pgp-ceremony-run — tmux ceremony orchestration

A shared-socket tmux session runner so multiple operators can attach to the live cert-init ceremony for the OCS card prompts. Drives openssl-pgp cert-init --generate-keys from the Jenkins pipeline.

openssl-pgp-revocation-recipients + checked-in recipient keys

Bundles the trusted set of revocation-encryption recipients into one armored OpenPGP keyring (plus a manifest and SHA256) so cert-init can encrypt the offline primary-key revocation to a known set of public keys. Recipient PGP keys committed under release-tools/openpgp/revocation-recipients/.

sq-pkcs11-git-shim + stage-release.sh --gpg-program

  • The shim translates git's gpg.program invocation shape (gpg --status-fd=N -bsau <keyid>) into a sq-pkcs11 sign call against an HSM-resident key identified by CKA_LABEL. Emits SIG_CREATED for older git versions that parse GnuPG status output. Anything other than --sign is forwarded to a real gpg via OPENSSL_PGP_FALLBACK_GPG, so git tag -v and similar keep working locally.
  • stage-release.sh --gpg-program=<path> redirects only git-tag signing through the supplied program (via git -c gpg.program=...). The direct gpg invocations elsewhere in stage-release.sh (tarball detached signatures, announcement clearsign) are deliberately untouched — combine with --unsigned to skip those and sign them out-of-band.

quarckster and others added 12 commits May 7, 2026 08:59
Operator-facing helper for the release-artifact OpenPGP signing policy
on top of sq-pkcs11 (which speaks PKCS#11 to the nShield HSM).

Capabilities:
  - generate the policy primary key (RSA-4096, OCS-protected) and
    current signing subkey (RSA-4096, module-protected) via nShield
    generatekey
  - issue / rotate the published OpenPGP certificate (5-year primary,
    1-year subkey, with --merge-cert for subkey rotation that
    preserves predecessor subkeys)
  - sign release artifacts with the current signing subkey
  - issue primary-key and subkey-revocation certificates

Each command validates only the env vars it actually consumes (e.g.
subkey-rotate doesn't demand OPENSSL_PGP_CURRENT_SUBKEY_LABEL), and
every label / cardset value is checked against the Security World via
cklist / nfkminfo before any HSM action runs, so a typo cannot leave a
half-finished state on the HSM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A gpg-CLI-compatible signing shim that translates git's gpg.program
invocation shape (gpg --status-fd=N -bsau <keyid> < tagbody) into a
sq-pkcs11 sign call against an HSM-resident key identified by CKA_LABEL.

The shim drains stdin to a temp file (sq-pkcs11 takes a path, not
stdin), runs sq-pkcs11 sign --output - so the armored signature streams
straight back on stdout where git expects it, and emits a SIG_CREATED
status line for older git versions that parse GnuPG's status protocol.

Anything that isn't a sign operation (verify, decrypt, list-keys, ...)
is forwarded to a real gpg via OPENSSL_PGP_FALLBACK_GPG (default "gpg"),
so `git tag -v` and similar continue to work on the operator's machine.

Used by stage-release.sh via --gpg-program; can also be configured
directly with `git config gpg.program <path>` for non-release tag
signing scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add --gpg-program=<path> to redirect git tag signing through a custom
program by invoking `git -c gpg.program=<path> tag -s ...`.  Direct gpg
invocations elsewhere in this script (tarball detached signature,
announcement clearsign) are deliberately untouched; combine with
--unsigned if those should be skipped and signed out-of-band.

Primary motivation is to route tag signing through
release-tools/sq-pkcs11-git-shim so the release tag can be signed by
an HSM-resident private key without involving the local gpg keyring.
The option is general — it's just a `gpg.program` override — so any
gpg-CLI-compatible signer can be plugged in.

When --gpg-program is used together with --local-user, the keyid form
expected by --local-user follows whatever the configured program
expects (key id / fingerprint for gpg, CKA_LABEL for the sq-pkcs11
shim).  This is documented in the help text and the manual.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sq-pkcs11's subkey-revoke API changed: instead of identifying the
subkey by CKA_LABEL (which required HSM access to the subkey's
private key — broken for the compromise scenario), the new shape
takes --input-cert + --subkey-fingerprint and only exercises the
primary's private key.  The wrapper follows.

Operational change: callers of \`openssl-pgp subkey-revoke\` now
pass the subkey fingerprint they want revoked, looked up from the
published cert via \`sq inspect release.asc\` or
\`gpg --list-keys --with-subkey-fingerprint\`.  The lost / compromised
subkey path no longer requires the HSM to still hold its private key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this change, openssl-pgp sign would happily use whatever HSM
key OPENSSL_PGP_CURRENT_SUBKEY_LABEL pointed at, as long as it was
RSA-4096 and present on the HSM.  A typo'd or stale label could
therefore produce a "valid"-looking release signature that fails
verification against the published cert — exactly the kind of error
a release-engineering tool should refuse to make.

Add a pre-flight call to `sq-pkcs11 verify-signing-key`, before any
real signing operation, that confirms the configured HSM key is a
current valid signer of $OPENSSL_PGP_CERT (alive, not revoked,
signing-flagged binding under Sequoia's StandardPolicy).  On
mismatch, openssl-pgp refuses with a message naming the offending
env var; no HSM signing operation is consumed.

require_cert_exists $OPENSSL_PGP_CERT is also added — sign now needs
a published cert on disk to validate against (it was previously
unused on the sign path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A small wrapper invoked as `openssl-pgp-ceremony-run <state-dir> <cmd>
[args...]` that runs an OpenPGP ceremony command, tees its combined
stdout/stderr to <state-dir>/ceremony.log, and persists the final exit
status to <state-dir>/rc.

The state directory acts as a handshake point between the orchestrator
that starts the ceremony and the operator who attaches to type
passphrases: the orchestrator polls for `rc` to know when the command
has finished and what status it returned, while the log file gives
post-hoc audit material.

This commit introduces the bare runner shape only; the tmux-based
session sharing for attended OCS passphrase entry comes in the next
commit so the skeleton is reviewable on its own.
…stration

Restructure the runner into two subcommands:

  start-and-wait: invoked by an automation server (Jenkins agent).
  Creates a detached tmux session on the HSM client via a named socket,
  runs the requested OpenPGP command inside that session, prints attach
  instructions, and waits for the command's exit status (polls the
  state directory's rc file with a configurable --timeout).

  exec: the tmux-side runner that actually executes the command,
  records output to ceremony.log, and writes the exit status to rc.
  Equivalent to the original single-mode behaviour and used as the
  tmux session command by start-and-wait.

This split keeps the automation server as the orchestrator and audit
collector only — it sees the command, logs, outputs, and final exit
status, but OCS passphrases are typed into the shared tmux terminal by
custodians (over SSH + nShield Remote Administration / TVD), never
passed through CI parameters, credentials, or a web UI.

The --allow-user flag uses `tmux server-access -a/-w` to grant named
users read/write access to the session via the shared socket.  Socket
permissions are set to 0660 with the group inherited from the socket
directory so attaching from a fixed Unix group is sufficient — no
chmod 0666 / world-readable sockets.
Switch generatekey_pkcs11 to invoke `generatekey --generate --batch
pkcs11 ...` so the operator is not prompted to confirm each parameter
on stdin before the OCS prompts appear.  Two scenarios where this
matters:

  - Automated cert-init runs from Jenkins, where there is no
    interactive operator at the generatekey step; the wrapper has
    already validated every parameter (labels, cardset existence,
    key-size, logkeyusage policy) before calling generatekey.

  - Tmux-shared ceremonies via openssl-pgp-ceremony-run, where the
    expected operator interaction is OCS card presentation and
    passphrase entry — not parameter confirmation.  Without --batch
    the custodians would have to press Enter through generatekey's
    parameter-review prompt before the card prompts appeared,
    cluttering the shared terminal with non-decision input.

The arguments fed into generatekey come exclusively from the wrapper
(constructed from validated env vars and CLI flags), so the
interactive confirmation step --batch suppresses is redundant in
this context.  The note() string is updated to reflect the new flag
for parity with what the operator will see in the ceremony log.
A helper invoked as `openssl-pgp-revocation-recipients bundle --source
<dir> --bundle <out> --manifest <out> --bundle-sha256 <out>` that
collects the trusted set of public OpenPGP certificates authorised to
decrypt the offline primary-key revocation certificate produced by
cert-init.

The source directory contains:
  - recipients.txt — authoritative manifest, one
    "<40-hex fingerprint> <email>" line per recipient
  - <fingerprint>.pgp — one file per manifest entry, the recipient's
    public certificate

For each manifest entry the helper:
  - confirms the cert file exists and is non-empty
  - runs `sq inspect` and asserts both the declared fingerprint and
    User ID email are present in the cert (catches stale fingerprints
    or accidentally-swapped files)
  - test-encrypts a probe message via `sq encrypt --for-file` to
    confirm the cert is currently usable for encryption under sq's
    policy (catches expired keys, policy-rejected ciphersuites, etc.)
  - appends the cert to the output bundle and the (fingerprint, email)
    pair to the output manifest, finally writing the bundle's SHA-256

The cert-init Jenkins pipeline runs this helper before the ceremony
so the encrypted offline-revocation artifact can only be produced if
the recipient set is fully valid — no half-encrypted artifacts
slipping through if one recipient's key has expired or been
withdrawn since the last run.

Three initial recipients are checked in under
release-tools/openpgp/revocation-recipients/ with a README documenting
the manifest format and the validation contract above.
…rant

`tmux server-access -a <user>` fails when invoked for the user who
owns the running tmux server: the server starts that user implicitly
in the allow-list, and tmux refuses to add a duplicate entry.  In the
ceremony runner, this manifested as a hard failure of the access-grant
loop the moment --allow-user named the CI agent's own account.

Detect the case (compare each --allow-user value to `id -un`) and
skip the redundant grant.  The loop then continues to grant access to
the remaining custodians, leaving the session alive.

Without this fix, an operator list that included the server-owning
account would kill the session and bail on what is structurally a
no-op.
@quarckster quarckster requested review from levitte and t8m May 11, 2026 15:59
quarckster and others added 5 commits May 13, 2026 14:52
The "Signing the release files" step shelled out to gpg, which fails on
Jenkins agents that do not have the release private key in a local gpg
keyring -- the key lives on the HSM.  Replace the two gpg invocations
(tarball detached signature, announcement clearsign) with calls to the
openssl-pgp wrapper, which signs via sq-pkcs11 against the HSM key.

Also default --gpg-program to release-tools/sq-pkcs11-git-shim when not
explicitly set, so git tag signing goes through the HSM by the same
mechanism; the flag is still overridable (e.g. --gpg-program=gpg) for
operators using a local gpg keyring.  --local-user now also exports
OPENSSL_PGP_CURRENT_SUBKEY_LABEL so the openssl-pgp subprocesses pick up
the same key without a second env-var hop.

openssl-pgp / sq-pkcs11 produce detached, ASCII-armored signatures only,
so the announcement is no longer cleartext-signed.  Both the plain .txt
and the detached .txt.asc are uploaded; FILES section of the manual is
updated to reflect this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The script grew options over the years that the Jenkins release job
does not exercise.  Remove them and the code paths they gated:

  --clean-worktree    The script now always operates on the current
                      worktree (refusing to run if dirty).  Drops the
                      sibling-clone setup and the "cd $release_clone"
                      dance, plus the three `git push parent HEAD`
                      back-pushes to the original repo.
  --branch-fmt        Branch and tag names follow the standard scheme
  --tag-fmt           (%b / %t).  The two formats were always identical
                      under --clean-worktree, so format_string and
                      release-aux/string-fn.sh are no longer needed.
  --staging-address   No more uploads from this script -- artifacts
  --no-upload         land in the parent directory and the caller is
                      responsible for shipping them.  Drops the
                      staging_address parser and the upload-fn.sh
                      backends entirely.
  --no-update         `make update` and `make update-fips-checksums`
                      always run.
  --force             The branch-name check is now strict.

The --debug flag no longer implies --no-upload (no upload step
exists).  The metadata file's `upload_files` key becomes
`release_files`; the now-impossible staging_update_branch /
staging_release_branch keys are removed.  POD manual is trimmed to
match.

The cleanup leaves the script ~400 lines shorter and the supported
invocation matches what the Jenkins job actually uses:

    stage-release.sh \\
        $ALPHA_FLAG $BETA_FLAG $FINAL_FLAG \\
        --branch \\
        --gpg-program="$GPG_PROGRAM" \\
        --local-user="$SIGNING_KEY_LABEL" \\
        --reviewer="$REVIEWER_1" --reviewer="$REVIEWER_2"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
stage-release.sh: stop producing openssl-<version>.txt and its
detached signature.  The announcement-template machinery (template
selection, sed expansion, fix-title.pl, the second openssl-pgp sign
call) is gone, along with the `length` checksum that only the
announcement consumed.  The release_files manifest, FILES section of
the POD, and the leftover comments in --gpg-program / tagkey header
follow suit.

do-copyright-year: the progress spinner used \r to overwrite a single
status line, which works on a TTY but not in Jenkins -- every tick
of the spinner showed up as its own log line, polluting the build
output.  Skip the spinner entirely when stdout is not a TTY, keeping
just the "Updating copyright" / "Files considered: N" / "Files
changed: N" lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tarball signing was already inlined as $RELEASE_TOOLS/openssl-pgp, but
tag signing was routed through a --gpg-program flag whose only sensible
value was the in-repo sq-pkcs11-git-shim.  Make the pair symmetric:
inline sq-pkcs11-git-shim at the `git tag -s` call and remove
--gpg-program entirely (from getopt, case dispatch, --help text, and
the POD manual).  --unsigned still covers the skip-signing escape
hatch; --local-user remains the single knob for the signing key.

The Jenkins invocation can drop its --gpg-program="$GPG_PROGRAM" arg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eded

The Jenkins job passed --branch unconditionally; the existing
patch-number / same-branch checks already collapsed it to a no-op when
not applicable.  Make do_branch=true the default and remove the flag.

  - on master at PATCH == 0:  release branch openssl-X.Y is created
                              (same as --branch was honored before)
  - on a release branch
    or PATCH != 0:            release commit lands on the current branch
                              (same as --branch was ignored before, just
                              without the "--branch ignored" warning)

This removes --branch from getopt/case dispatch/--help/POD, drops the
warn_branch bookkeeping that only fed that warning, removes the
"--final implies --branch" plumbing and the "--branch is invalid
unless current branch is master" guard (the do_branch=false branch
now handles non-master worktrees identically).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants