Skip to content

sensitive config with secrets#441

Open
henderiw wants to merge 20 commits into
mainfrom
sensitive
Open

sensitive config with secrets#441
henderiw wants to merge 20 commits into
mainfrom
sensitive

Conversation

@henderiw
Copy link
Copy Markdown
Contributor

@henderiw henderiw commented May 6, 2026

Summary

This PR introduces secure handling of secret references in Config resources. Config blobs can
now embed secret::<secretName>::<keyName> placeholders that are resolved against Kubernetes
Secrets at apply time. Resolved values are AES-256-GCM encrypted and never stored in plaintext
or cached in the informer cache.

Two new CRDs are introduced (SensitiveConfig, TargetSnapshot), one new controller
(Resolver), and the existing TargetConfig and TargetRecovery controllers are refactored to
use the encrypted resolved state as their source of truth.


Motivation

Previously, Config resources had no mechanism for referencing sensitive values such as passwords
or tokens. Any credentials embedded in a Config blob were stored in plaintext in the API server
and visible to any controller or process with Config read access. This PR closes that gap.


Secret Reference Format

Placeholders are embedded as string values anywhere inside the Config blob JSON:

secret::<secretName>::<keyName>

Example Config:

spec:
  config:
  - path: /system/aaa
    value:
      password: "secret::device-credentials::password"

The original Config.Spec.Config always retains the secret:: placeholder. The resolved value
exists only in the encrypted SensitiveConfig.Spec.Payload.Data and is never written back to
the Config resource.


Architecture

Controller Responsibilities

Controller Watches Writes Knows About
Resolver Config, Secrets (metadata), KeyRing Secret SensitiveConfig Secret values, encryption
TargetConfig Target, SensitiveConfig, Config TargetSnapshot, Config status Datastore transactions
TargetRecovery Target Recovery replay

New CRDs

SensitiveConfig — one per Config, same name and namespace. Carries the encrypted resolved
blobs and the metadata needed for change detection:

spec:
  configHash: sha256(cfg.Spec.Config)      # hash of unresolved blobs
  secretKeyHashes:
    "mySecret/password": sha256(value)     # per-key hash for efficient change detection
  payload:
    keyID: "key-1"                         # identifies which AES key was used
    plainHash: sha256(resolvedJSON)        # hash of resolved JSON before encryption
    data: <AES-GCM ciphertext>

TargetSnapshot — one per Target, same name and namespace. Records the exact resolved and
encrypted blobs that were last confirmed to the datastore. Used as the sole source of truth for
crash recovery — Config.Status.AppliedConfig has been removed.

spec:
  configs:
    myConfig:              # one entry per applied Config
      payload: ...         # encrypted blobs confirmed to device
      priority: 10

Resolver Controller

The Resolver watches Config resources and produces a SensitiveConfig for each one.

Secret access: Uses mgr.GetAPIReader() exclusively — a direct API call that bypasses the
informer cache. Raw secret values are never held in memory beyond the scope of a single reconcile.

Always encrypts: Configs with no secret references are still encrypted. This keeps the
TargetSnapshot self-contained — recovery never needs to fall back to Config.Spec.Config, which
may have changed since the last confirmed transaction.

Change detection evaluates three criteria on every reconcile, without early exit, so that
per-dimension annotations accurately reflect what changed:

  1. keyringChanged — whether the payload was encrypted with a non-primary key (free, no API call)
  2. configChangedsha256(cfg.Spec.Config) vs the stored configHash
  3. secretChanged — per-key sha256(secret.Data[key]) vs stored secretKeyHashes

Based on the result:

  • No change → skip
  • Keyring changed only → re-encrypt in-place without fetching secrets
  • Config or secret changed → fetch secrets, substitute references, encrypt with primary key

On resolution failure (secret missing, key not found), the Resolver sets ConfigResolverFailed
on the Config and preserves the last known good SensitiveConfig so the TargetConfig controller
continues operating. It requeues after 30 seconds.

Deletion: The Resolver only removes its own finalizer. It does not delete the
SensitiveConfig. The SC must remain alive as the record of what was applied and as the failure
surface if the datastore delete fails. The TargetConfig controller deletes the SC only after
the datastore deletion is confirmed.


TargetConfig Controller

Change detection compares the current SensitiveConfig against the TargetSnapshot entry:

changed = not in snapshot
       || PlainHash differs        (content changed)
       || Priority differs
       || Revertive differs
       || Lifecycle differs
       || NeedsReencryption        (key rotation — re-transact so snapshot gets new key)

Transaction flow is split into pure gRPC steps and pure Kubernetes steps with an explicit
cancel on failure:

BuildGRPCIntents   ← pure transformation
Execute(txID)      ← gRPC only
AnalyzeResponse    ← pure
    │
    ├─ error → Cancel(txID) → ProcessErrors (K8s status)
    └─ ok    → Confirm(txID)
               ProcessSuccess (K8s status + SC deletion for deletes)
               saveSnapshot

Deletion flow: When a Config gets a deletionTimestamp, the TargetConfig controller is
triggered via its Config watch (since the Resolver keeps the SC alive, no SC deletion event
fires). The delete intent is built from the Config alone — the SC is not needed for the gRPC
call. After TransactionConfirm, ProcessSuccess explicitly deletes the SC.

path not found on delete: When the datastore returns "path not found in tree" for a
delete intent, it is treated as idempotent success — the path was already absent, the desired
state is achieved. The Config finalizer, deviation, and SC are cleaned up normally.


TargetRecovery Controller

After a restart, the TargetRecovery controller replays the last confirmed state from the
TargetSnapshot. For each snapshot entry with a matching live Config, it decrypts the payload and
sends it to the datastore with PreviouslyApplied: true. Configs not present in the snapshot
were never successfully applied and are skipped — the normal reconcile handles them after
recovery completes.

Config.Status.AppliedConfig has been removed. The TargetSnapshot is the only recovery source.


KeyRing

Encryption keys are stored in a Kubernetes Secret (identified by the label
config.sdcio.dev/keyring: "true") as a JSON blob:

{
  "primary": "key-1",
  "keys": {
    "key-1": "<base64 of 32 random bytes>"
  }
}

The Resolver loads the KeyRing at startup via mgr.GetAPIReader() and reloads it in-place
whenever the Secret changes. All encrypt/decrypt operations are safe for concurrent use.

Key rotation: Add a new key ID and bytes to keys, update primary. The Resolver detects
NeedsReencryption on each SensitiveConfig and re-encrypts them without refetching secrets.
Once all SensitiveConfigs carry the new keyID, the old key entry can be removed from the
Secret.

Generating key material:

# Generate the keyring.json value (pipe through base64 for the Secret data field)
echo '{"primary":"key-1","keys":{"key-1":"'$(dd if=/dev/urandom bs=32 count=1 2>/dev/null | base64 | tr -d '\n')'"}}' | base64

The base64-encoded result goes into data["keyring.json"] in the Secret.

@henderiw henderiw requested a review from a team as a code owner May 6, 2026 20:13
@henderiw henderiw linked an issue May 7, 2026 that may be closed by this pull request
@henderiw henderiw linked an issue May 9, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sensitive data Sensitive Config

1 participant