The Secure AI Appliance uses rpm-ostree (immutable OS) with Greenboot health checks and automatic rollback. This guide covers recovery scenarios from simple auto-rollback to full factory reset.
Greenboot runs health checks after every boot. If the post-update boot fails the health checks, Greenboot automatically rolls back to the previous deployment.
- An update is applied via
rpm-ostree upgrade(staged or immediate). - The system reboots into the new deployment.
- Greenboot runs health checks within
health_check_timeoutseconds (default: 300 seconds / 5 minutes). - Health checks verify:
- Core services are running (registry, tool firewall, UI).
- The vault can be mounted.
- The inference engine responds on its health endpoint.
- Network firewall rules are intact.
- If any check fails, Greenboot initiates a rollback.
- The system reboots into the previous (known-good) deployment.
- Maximum
max_rollback_attempts(default: 2) rollback cycles are attempted before halting for manual intervention.
After a rollback, check the update status:
curl http://127.0.0.1:8480/api/update/statusOr via the Web UI under the Updates tab.
Check Greenboot logs:
journalctl -u greenboot-healthcheck.serviceCheck the health check result file:
cat /var/lib/secure-ai/logs/health-check.jsonrpm-ostree statusThis shows the current and previous deployments. The rollback deployment will be marked as active, and the failed deployment will be listed but not booted.
If the system is running but unstable after an update, you can manually roll back without waiting for Greenboot.
- Go to the Updates tab.
- Click Rollback.
- Confirm the action.
- The system will schedule a rollback and reboot.
curl -X POST http://127.0.0.1:8480/api/update/rollback \
-H "Authorization: Bearer <session-token>" \
-H "X-CSRF-Token: <csrf-token>"sudo rpm-ostree rollback
sudo systemctl rebootAfter reboot, verify the rollback:
rpm-ostree statusThe previous deployment should now be active.
If you executed a Level 2 panic (key wipe) and need to recover, the cryptographic keys (LUKS header, cosign keys, TPM2 keys, MOK key) have been shredded. The vault data is still on disk but cannot be decrypted without the LUKS header backup.
- Restore the LUKS header:
sudo cryptsetup luksHeaderRestore /dev/<vault-partition> \
--header-backup-file /path/to/luks-header-backup- Open the vault with your passphrase:
sudo cryptsetup open /dev/<vault-partition> secure-ai-vault
sudo mount /dev/mapper/secure-ai-vault /var/lib/secure-ai- Re-run the first boot script to regenerate keys:
sudo /usr/libexec/secure-ai/firstboot.shThis will:
- Generate new cosign signing keys.
- Generate a new MOK (Machine Owner Key) if Secure Boot is enabled.
- Generate a new service-to-service auth token.
- Re-seal the vault key to TPM2 PCR values.
- Set up canary files.
- Restart all services:
sudo systemctl restart secure-ai-*.serviceThe vault data is unrecoverable. You must perform a full reinstall:
- Re-image the system with the SecAI OS ISO.
- The installer will create a new LUKS partition.
- First boot will set up all keys and services.
- Import models fresh.
A Level 3 panic performs a full wipe:
- The vault is re-encrypted with a random key (data unrecoverable).
- Memory is cleared.
- Logs, registry, and auth data are deleted.
- TPM2 keys and MOK key are shredded.
After a Level 3 panic, the system is in a blank state. To recover:
- Reboot the system.
- The first-boot script will run automatically and:
- Create a new LUKS partition (or re-initialize the existing one).
- Generate new cryptographic keys.
- Set up the passphrase.
- Configure TPM2 sealing.
- Access the Web UI at
http://127.0.0.1:8480. - Set up a new passphrase on the login page (first-boot setup flow).
- Import models through the catalog or manual import.
All previous data (models, outputs, chat history, logs) is permanently lost. This is by design -- Level 3 is the "everything must go" option.
If the system is stuck in a boot loop (Greenboot keeps failing and rollback is exhausted):
- Boot from the SecAI OS USB installer.
- Mount the system partitions.
- Check the Greenboot failure logs:
mount /dev/<root-partition> /mnt
cat /mnt/var/log/journal/*/system.journal # binary, use journalctl
journalctl --root=/mnt -u greenboot-healthcheck.service- If the issue is a bad update, manually pin the previous deployment:
chroot /mnt rpm-ostree rollback- Unmount and reboot.
If the root filesystem is corrupted beyond repair:
- Re-image from USB.
- If the vault partition is intact, mount it after reinstall to recover models and outputs.
In appliance.yaml:
updates:
staged_updates: trueWith staged updates, the system downloads and prepares the update but does not apply it until you explicitly confirm via the UI or API. This gives you a chance to review before rebooting.
updates:
cosign_verify: trueThis verifies the update's cryptographic signature before applying, preventing tampered or unsigned updates from being installed.
After initial setup, back up your LUKS header to a secure offline location:
sudo cryptsetup luksHeaderBackup /dev/<vault-partition> \
--header-backup-file /media/usb/luks-header-backupStore this backup securely. Anyone with the backup and your passphrase can decrypt the vault.