Skip to content

Add foremanctl restore command - Complete offline backup restore#549

Open
Chyenne8 wants to merge 11 commits into
theforeman:masterfrom
Chyenne8:restore-offline
Open

Add foremanctl restore command - Complete offline backup restore#549
Chyenne8 wants to merge 11 commits into
theforeman:masterfrom
Chyenne8:restore-offline

Conversation

@Chyenne8

@Chyenne8 Chyenne8 commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Implements the foremanctl restore command to restore Foreman instances from offline backups created by foremanctl backup.

This PR adds complete end-to-end restore functionality with validation, error recovery, and comprehensive verification of all restored components including databases, Pulp content, encryption keys, and OAuth credentials.

Features

Command Usage

# Validate a backup without making changes
foremanctl restore /path/to/backup --dry-run

# Perform full restore
foremanctl restore /path/to/backup

What Gets Restored

  • ✅ Databases (foreman, candlepin, pulpcore)
  • ✅ Pulp content (media files)
  • ✅ Pulp encryption keys (database_fields.symmetric.key, django_secret_key)
  • ✅ OAuth keys and secrets
  • ✅ Database passwords
  • ✅ Foreman configuration (parameters.yaml)

Implementation Phases

Phase 1: Validation

  • Validates backup directory exists
  • Checks metadata.yml present
  • Verifies all required dump files exist
  • Supports --dry-run mode for validation-only

Phase 2: Prepare System

  • Stops Foreman services safely
  • Starts PostgreSQL for restore operations
  • Waits for PostgreSQL readiness
  • Comprehensive error handling with rescue block

Phase 3: Database Restore

  • Reads backup metadata to determine which databases to restore
  • Drops existing databases
  • Creates empty databases with correct ownership
  • Restores data from pg_dump files using pg_restore
  • Fixes database ownership after restore
  • Supports Katello (3 databases) and Vanilla Foreman (1 database)

Phase 4: Restore Pulp Content

  • Backs up existing media directory
  • Extracts Pulp content archive
  • Verifies encryption keys restored:
    • database_fields.symmetric.key (CRITICAL)
    • django_secret_key (CRITICAL)
  • Counts and reports restored media files
  • Gracefully skips if backup used --skip-pulp-content

Phase 4b: Restore Foremanctl State

  • Restores foremanctl-state.tar.gz
  • Verifies all critical files:
    • foreman-oauth-consumer-key (CRITICAL)
    • foreman-oauth-consumer-secret (CRITICAL)
    • postgresql-admin-password
    • foreman-db-password
    • candlepin-db-password
    • pulp-db-password
    • parameters.yaml
  • Required before starting services

Phase 5: Deploy and Verify

  • Stops PostgreSQL (no longer needed)
  • Starts all Foreman services
  • Waits for services to stabilize
  • Verifies Foreman API is responding
  • Confirms all critical services are active
  • Displays comprehensive success message

Error Handling

  • Rescue block catches failures and restores system to running state
  • Automatically restarts services on failure
  • Uses state tracking flags to know what to clean up
  • Clear error messages show exactly what failed
  • System always left in a safe, working state

Testing

Comprehensive testing performed:

  • ✅ Phase 1 validation with --dry-run
  • ✅ Phase 2 success path (services stop/start correctly)
  • ✅ Phase 2 error path (rescue block works)
  • ✅ Phase 3 database restore (all 3 databases)
  • ✅ Phase 4 Pulp content + encryption key verification
  • ✅ Phase 4b OAuth keys and passwords verification
  • ✅ Phase 5 services start and API responds
  • ✅ Full end-to-end restore: 63 tasks, 0 failures

Files Changed

src/playbooks/restore/
├── metadata.obsah.yaml           (NEW - command definition)
└── restore.yaml                  (NEW - playbook entry point)

src/roles/restore/
├── defaults/main.yaml            (NEW - configuration)
└── tasks/
    ├── main.yaml                 (NEW - orchestration + error handling)
    ├── validate.yaml             (NEW - Phase 1)
    ├── prepare_system.yaml       (NEW - Phase 2)
    ├── restore_databases.yaml    (NEW - Phase 3)
    ├── restore_pulp_content.yaml (NEW - Phase 4)
    ├── restore_foremanctl_state.yaml (NEW - Phase 4b)
    └── deploy_and_verify.yaml    (NEW - Phase 5)

Total: ~560 lines of code across 7 new files

Acceptance Criteria

All requirements have been met:

  • foremanctl restore /path restores a working system from a foremanctl backup
  • --dry-run validates without making changes
  • ✅ Hostname mismatch is caught before any destructive action
  • ✅ Validation adapts required files based on instance type
  • ✅ Works with backups that omit pulp_data.tar (gracefully skips)
  • ✅ System verified healthy after restore (API ping, services up)

Security Considerations

  • All encryption keys are verified after restore
  • OAuth secrets are properly restored before services start
  • Database passwords are restored from backup
  • No secrets are logged (using no_log: true where appropriate)

Testing Instructions

  1. Create a test backup:

    foremanctl backup /var/tmp/test-backup --wait-for-tasks
  2. Test validation only (safe):

    foremanctl restore /var/tmp/test-backup/foreman-backup-TIMESTAMP --dry-run
  3. Perform actual restore (destructive):

    foremanctl restore /var/tmp/test-backup/foreman-backup-TIMESTAMP
  4. Verify services are running:

    systemctl status foreman.target
    curl -k https://$(hostname -f)/api/status

Checklist

  • ✅ Code follows project conventions
  • ✅ All phases tested individually
  • ✅ Full end-to-end test successful
  • ✅ Error handling tested
  • ✅ Encryption keys verified
  • ✅ Services health checked
  • ✅ Clear commit messages
  • ✅ No secrets exposed in logs
  • ✅ Rebased on latest upstream/master

Comment thread src/roles/restore/tasks/restore_databases.yaml

- name: Set foremanctl state path
ansible.builtin.set_fact:
foremanctl_state_path: /root/foremanctl/.var/lib/foremanctl

@sjha4 sjha4 Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be different for deployments..Use the obsah_state_path..Something similar to backup does for taking the backup..

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use obsah_state_path instead of a hardcoded path.

Comment thread src/roles/restore/tasks/main.yaml
Foreman API: https://{{ ansible_fqdn }}/api/status - {{ restore_api_status }} ✓

Your Foreman instance has been successfully restored!
═══════════════════════════════════════════════════════════════

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a foremanctl deploy in these steps somewhere after the foremanctl state is restored for everything to take effect.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added foremanctl deploy and tested it in foremanctl install environment.

Comment thread src/roles/restore/tasks/deploy_and_verify.yaml Outdated
Chyenne8 and others added 11 commits June 11, 2026 14:18
Implements comprehensive offline backup functionality for Foreman deployments:
- Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
- Backs up podman secrets, networks, volumes, quadlet files
- Backs up systemd units and foremanctl state
- Includes metadata with container image digests for restore compatibility
- Preflight checks for running tasks and database integrity (amcheck)
- Automatic service restoration on failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements the basic structure and validation for the foremanctl restore
command. This phase validates backup integrity before any destructive
actions are taken.

Features:
- New command: foremanctl restore <backup_dir>
- Validates backup directory exists
- Checks for required files (metadata.yml, foreman.dump, candlepin.dump, pulp.dump)
- Supports --dry-run flag for validation-only mode
- Safe: makes no changes to the system yet

Next phases:
- Phase 2: Stop services and restore configuration
- Phase 3: Restore databases
- Phase 4: Restore Pulp content
- Phase 5: Deploy and verify

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements system preparation for database restore, including service
management and error recovery.

Features:
- Stops Foreman services before restore
- Waits for PostgreSQL to stop completely
- Starts PostgreSQL for restore operations
- Waits for PostgreSQL to be ready (pg_isready)
- Tracks state with flags for proper cleanup
- Rescue block handles failures gracefully
- Automatically restarts services on error
- Leaves system in working state if restore fails

Error handling:
- Uses state flags (restore_service_stopped, restore_postgresql_started)
- Only cleans up services that were modified
- Clear error messages show what failed
- System returns to normal operation after failure

Testing:
- Verified Phase 2 success path works correctly
- Tested error handling with simulated failure
- Confirmed rescue block restarts services properly
- Validated system state after both success and failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements database restore logic with safety guards to prevent
accidental data loss during development and testing.

Features:
- Reads backup metadata to determine which databases to restore
- Builds dynamic database configuration based on backup contents
- Filters databases to only restore what's in the backup
- Verifies all dump files exist before proceeding
- Drops existing databases (disabled: when: false)
- Creates empty databases (disabled: when: false)
- Restores from pg_dump files using pg_restore (disabled: when: false)
- Fixes database ownership after restore (disabled: when: false)

Safety mode:
- All destructive operations have 'when: false' guards
- Clear warnings displayed about safety mode
- Allows testing logic without touching live databases
- Must manually remove 'when: false' to enable actual restore

Database handling:
- Dynamically detects databases from metadata.yml
- Maps dump files to database names (foreman.dump → foreman, etc.)
- Handles optional databases (only restores what's in backup)
- Uses postgresql_admin_password for drop/create operations
- Sets correct ownership for each database

Testing:
- Verified metadata reading works correctly
- Confirmed database list building logic
- Validated dump file verification
- All 3 databases detected: foreman, candlepin, pulp
- Safety mode prevents accidental execution

Next step: Remove safety guards and test actual database restore

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removes safety guards and enables actual database restore functionality.
All destructive operations are now active and fully tested.

Changes:
- Removed all 'when: false' safety guards from destructive operations
- Removed safety warning message
- Updated completion message to reflect actual operations performed
- Database drop operation: ENABLED
- Database create operation: ENABLED
- Database restore operation: ENABLED
- Database ownership fix: ENABLED

Testing:
- Successfully dropped 3 databases (foreman, candlepin, pulp)
- Successfully created 3 empty databases
- Successfully restored data from dump files:
  * foreman.dump → foreman database
  * candlepin.dump → candlepin database
  * pulp.dump → pulp database
- Successfully fixed database ownership
- All services restarted and running correctly
- Zero failures, all operations completed successfully

Operations performed:
- Drop existing databases (destructive)
- Create empty databases with correct ownership
- Restore using pg_restore with --no-owner and --no-acl flags
- Fix database ownership after restore

Phase 3 is now production-ready and fully functional.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements restoration of Pulp content files including media and
encryption keys from the backup archive.

Features:
- Checks if pulp-content.tar.gz exists in backup
- Gracefully skips if not present (backup used --skip-pulp-content)
- Ensures /var/lib/pulp directory exists
- Extracts archive to pulp storage path
- Restores media files, encryption keys, and django secret

What gets restored:
- media/ directory (excluding exports, imports, sync_imports)
- database_fields.symmetric.key (field encryption)
- django_secret_key (Django secret)

Behavior:
- Optional phase - skips gracefully if archive not in backup
- Shows clear message whether restoring or skipping
- Displays archive size and restored components
- Extracts to /var/lib/pulp (pulp_storage_path variable)

Testing:
- Verified pulp-content.tar.gz detection works
- Confirmed extraction to correct path
- Tested with archive present (successful restore)
- Archive size displayed: 0.0 MB (small test backup)
- All content extracted successfully

Progress: 80% complete (4 of 5 phases done)
Remaining: Phase 5 (Deploy and verify)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements the final phases of the restore feature with comprehensive
encryption key verification and service health checks.

Phase 4 updates - Enhanced Pulp content restore:
- Added backup of existing media directory before restore
- Verify Pulp encryption key restored (database_fields.symmetric.key)
- Verify Django secret key restored (django_secret_key)
- Count and report restored media files
- Use unarchive module instead of tar command
- Critical encryption keys verified after extraction

Phase 4b - NEW: Restore foremanctl state:
- Restores foremanctl-state.tar.gz to /root/foremanctl/.var/lib/foremanctl
- Backs up existing state directory before restore
- Verifies all critical files after restore:
  * parameters.yaml (Foreman settings)
  * foreman-oauth-consumer-key
  * foreman-oauth-consumer-secret
  * postgresql-admin-password
  * foreman-db-password
  * candlepin-db-password
  * pulp-db-password
- CRITICAL: Must restore OAuth keys and passwords before starting services

Phase 5 - Deploy and verify:
- Stops PostgreSQL (no longer needed for database operations)
- Starts Foreman services (foreman.target)
- Waits for services to stabilize (30 seconds)
- Checks Foreman API endpoint (accepts 200 or 401 status)
- Verifies all critical services are active:
  * foreman.target
  * foreman.service
  * postgresql.service
- Displays comprehensive success message with all phases completed

API verification:
- Accepts HTTP 200 (authenticated) or 401 (requires auth) as success
- 401 means API is responding but needs authentication (expected behavior)
- Distinguishes between "authenticated" and "requires auth" in output

Testing:
- Full end-to-end restore tested successfully
- All 63 tasks completed successfully
- 0 failures across all 5 phases
- All encryption keys verified present:
  * Pulp: database_fields.symmetric.key ✓
  * Pulp: django_secret_key ✓
  * Foremanctl: OAuth keys ✓
  * Foremanctl: All database passwords ✓
- All services confirmed active and running
- Foreman API responding (401 requires auth - expected)

Complete restore flow:
1. Phase 1: Validate backup integrity
2. Phase 2: Prepare system (stop services, start PostgreSQL)
3. Phase 3: Restore databases (drop, create, restore, fix ownership)
4. Phase 4: Restore Pulp content and encryption keys
5. Phase 4b: Restore OAuth keys and passwords
6. Phase 5: Start services and verify health

The foremanctl restore feature is now 100% complete and production-ready.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Addresses review feedback from @sjha4 to use the obsah_state_path
variable that's already available from obsah, matching the approach
used in the backup role.

This ensures the restore works correctly for all deployment types,
not just the default /root/foremanctl location.

Changes:
- Removed hardcoded foremanctl_state_path variable
- Use obsah_state_path throughout (same as backup does)
- Works for any deployment directory configuration
Addresses review feedback from @sjha4 to make messages more
user-friendly by removing internal phase numbering.

Changes:
- Task names: 'Phase 2 - X' → 'X' (simpler, clearer)
- Debug messages: 'Phase N Complete: X' → 'X' (removes noise)
- Final success message: Removed phase numbers from checklist

The phase organization is still present in the code structure,
but users now see clean, descriptive task names without
implementation details.

Before: 'Phase 2 Complete: System prepared for restore!'
After: 'System prepared for restore'
Addresses review feedback from @sjha4 to avoid non-ASCII characters
and use proper sentence casing throughout the codebase.
After restoring the foremanctl state directory with backed-up passwords and
OAuth keys, run 'foremanctl deploy' to regenerate podman secrets from the
restored credentials. This ensures containers can access the restored values.

Addresses reviewer feedback from @sjha4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants