Skip to content

Add offline backup for foremanctl#507

Open
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation
Open

Add offline backup for foremanctl#507
sjha4 wants to merge 1 commit into
theforeman:masterfrom
sjha4:backup-offline-implementation

Conversation

@sjha4

@sjha4 sjha4 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Implements comprehensive offline backup functionality for Foreman deployments:

  • Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
  • Backs up foremanctl state
  • Includes metadata with container image digests for restore compatibility
  • Preflight checks for running tasks and database integrity (amcheck)
  • Automatic service restoration on failure

Why are you introducing these changes? (Problem description, related links)

What are the changes introduced in this pull request?

  • Offline backup

How to test this pull request

I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl

Then try ./foremanctl --help

Also,

(.venv) [root@ip-10-0-167-40 foremanctl]# ./foremanctl backup  --help
usage: foremanctl backup [-h] [-v] [--incremental] [--online] [--skip-pulp-content] [--tar-volume-size TAR_VOLUME_SIZE] [--wait-for-tasks]
                         backup_dir

Create offline backup of Foreman databases and configuration

positional arguments:
  backup_dir            Directory where backup files will be stored

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --incremental         Perform incremental backup (not yet implemented)
  --online              Perform online backup without stopping services (not yet implemented)
  --skip-pulp-content   Skip Pulp content directory backup (not yet implemented)
  --tar-volume-size TAR_VOLUME_SIZE
                        Split tar archives at specified size in MB (not yet implemented, for Pulp content only)
  --wait-for-tasks      Wait for running tasks to complete instead of failing immediately

The help section has placeholders for incremental, online and --tar-volume-size which will be implemented in follow up cards/PRs.

You can run :
Command:

cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
./foremanctl backup /var/tmp/foreman-backup-test --wait-for-tasks

I have a dummy restore script which can be used for testing and also getting steps to run manually when testing.:

Download the script

wget https://gist.githubusercontent.com/sjha4/35d98b318f15753a678a406fb0fb14ad/raw/test-restore-final.sh

Make it executable

chmod +x test-restore-final.sh

Run restore

./test-restore-final.sh /path/to/backup/foreman-backup-TIMESTAMP

Steps to reproduce:

  • On a foremanctl box, run foremanctl backup BACKUP_DIR

Checklist

  • Tests added/updated (if applicable)
  • Documentation updated (if applicable)

@sjha4 sjha4 marked this pull request as ready for review May 13, 2026 19:28
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/backup.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 2 times, most recently from 15ba9dd to c6de1eb Compare May 14, 2026 14:14

@ianballou ianballou left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.

Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:

Bug #1: podman_network.yaml fails when no custom networks exist

File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:

# Option A: tolerate empty result
  failed_when: networks_json.rc not in [0, 1]

# Option B: check first, skip if no custom networks

Bug #2: Wrong Foreman tasks API endpoint

File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.

Bug #3: pg_isready and pg_dump not available on host

... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.

Bug #4: Hardcoded parameters.yaml path in metadata task

File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.

@sjha4

sjha4 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Will update the tasks endpoint and parameters.yaml path..

About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not.

@aidenfine

Copy link
Copy Markdown
Contributor

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

@sjha4

sjha4 commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

Maybe nitpick but does it make sense to include the not yet implemented flags when doing backup --help? Would you be opposed to just leaving them out in this PR?

I am fine either way but it's helpful guidance for future PRs and documentation to look at.

Comment thread src/playbooks/backup/tasks/podman_network.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_volumes.yaml Outdated
Comment thread src/roles/backup/tasks/preflight.yaml
Comment thread src/playbooks/backup/tasks/quadlet_files.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/quadlet_files.yaml Outdated
@sjha4 sjha4 force-pushed the backup-offline-implementation branch 4 times, most recently from 4da7174 to 10c3e00 Compare May 15, 2026 17:44
Comment thread src/playbooks/backup/tasks/database_detection.yaml Outdated
Comment thread src/playbooks/backup/tasks/database_dumps.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/foremanctl_state.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/metadata.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_secrets.yaml Outdated
Comment thread src/playbooks/backup/tasks/podman_volumes.yaml Outdated
@evgeni

evgeni commented May 19, 2026

Copy link
Copy Markdown
Member

not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :(

Comment thread docs/user/backup.md Outdated

### Database Integrity

For internal databases (`--database-mode internal`), the backup runs `amcheck` on all database indexes to verify B-tree consistency before proceeding.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant to the user or a detail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's relevant cause it is one of the steps users see on their playbook execution ouput. I will make it more user facing and less detail-y..

Comment thread docs/user/backup.md Outdated
Comment thread docs/user/backup.md Outdated
Comment thread docs/user/backup.md Outdated
Comment thread docs/user/backup.md Outdated
Comment thread docs/user/backup.md

## Backup Verification

After backup completes, verify the backup:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice for the backup command to do if we want users to see this information or perform this procedure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow..Like a backup verify command? The playbook does track all backups and provides a summary at the end of execution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a future improvement, non-blocking.

Comment thread src/roles/backup/defaults/main.yaml Outdated
Comment thread src/roles/backup/tasks/database_dumps.yaml
changed_when: true
no_log: true

- name: Calculate total backup size

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this task actually doing the calculation or more of a data collection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gathering file info..Let me name it better.. 👍🏼

Comment thread src/roles/backup/tasks/database_dumps.yaml Outdated
state: directory
mode: '0755'

- name: Test write permissions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this test fails?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup would fail at this step and throw a permissions error..I can catch it and show a better message instead of the default one.

Comment thread src/roles/backup/tasks/main.yaml Outdated
state: directory
mode: '0755'

- name: Run preflight checks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done before any other actions take place? For example, if preflight checks fail, the tasks above would keep creating timestamped directories that are empty.

Comment thread src/roles/backup/tasks/main.yaml Outdated
until: backup_postgres_status.status.ActiveState == 'inactive'
retries: "{{ backup_postgresql_stop_retries }}"
delay: "{{ backup_postgresql_stop_delay }}"
when: database_mode == 'internal'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bleeding a non-role variable into the role.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database_mode is more a deployment config i'd expect to be available for use across roles?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed this as a var in the playbook more explicitly for the role. 👍🏼

- name: Dump databases
ansible.builtin.command:
cmd: >
pg_dump

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it discussed already why we can not rely on community.postgresql.postgresql_db for these actions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for incremental dumps is not available which we need eventually, hence we went with pg_dump here.

Comment thread src/roles/backup/tasks/main.yaml Outdated
until: backup_pg_ready.rc == 0
changed_when: false

- name: Initialize databases_config with Foreman database

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the database config is bleeding non-role variables into the role, and we are getting into duplicated logic territory. For example, we solve this same problem over in https://github.com/theforeman/foremanctl/blob/master/src/playbooks/deploy/deploy.yaml#L16

There should be something we can do here to rely more on the database config that is already centralized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this to build this list from database and database_iop in the playbook pre-task similar to the deploy playbook..

# - Check for running Pulp tasks
# - Run amcheck on databases (if available and local)

- name: Check for running Foreman tasks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's similar but that one is looking for failed tasks and this one's looking for running tasks and polling them till they finish to stop the services for backup..

- not (wait_for_tasks | default(false))
- backup_foreman_running_tasks | int > 0

- name: Check for running Pulp tasks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be relevant to other commands such as update ?

@sjha4 sjha4 Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the entire file could be relevant. Waiting for foreman tasks, pulp tasks and maybe running integrity checks on DB..All should be meaningful before an update..

Comment thread src/roles/check_database_index/tasks/main.yml
@sjha4 sjha4 force-pushed the backup-offline-implementation branch from 8db6812 to c7afdef Compare June 9, 2026 14:57
Comment thread docs/user/backup.md Outdated
Comment thread docs/user/backup.md Outdated

### Database Integrity

For internal databases (`--database-mode internal`), the backup verifies that all database indexes are healthy before proceeding. This ensures the backup will be consistent and restorable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if amcheck is available. We should call that out here if the availability of this tool is not guaranteed within the users setup or our baseline requirements.

@sjha4 sjha4 force-pushed the backup-offline-implementation branch 3 times, most recently from b066ac9 to 16293d2 Compare June 10, 2026 21:05

- name: Wait for PostgreSQL readiness
ansible.builtin.command:
cmd: pg_isready -h {{ database_host }} -p {{ database_port }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need updating to run inside container and not on the host. Will update to use :

containers.podman.podman_container_exec:                                                                                               
        name: "{{ postgresql_container_name }}"                                                                                              
        command: pg_isready -h {{ database_host }} -p {{ database_port }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this here and assume postgresql is installed on host and pg_isready and pg_dump are available..I will add a pre_flight check to ensure posgreql package is available on the host.

@sjha4 sjha4 force-pushed the backup-offline-implementation branch 3 times, most recently from 4ecd05e to a67e4d0 Compare June 11, 2026 20:24
Implements comprehensive offline backup functionality for Foreman deployments:
- Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs)
- Backs up podman secrets, networks, volumes, quadlet files
- Backs up systemd units and foremanctl state
- Includes metadata with container image digests for restore compatibility
- Preflight checks for running tasks and database integrity (amcheck)
- Automatic service restoration on failure

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sjha4 sjha4 force-pushed the backup-offline-implementation branch from a67e4d0 to 273fa50 Compare June 11, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants