Add offline backup for foremanctl#507
Conversation
15ba9dd to
c6de1eb
Compare
ianballou
left a comment
There was a problem hiding this comment.
I did an automated test run, take these with a grain of salt, but I wanted to post early in case they're real for extra time to test and fix.
Firstly, the DBs did get backed up, so awesome, but there were some hiccups that stopped the full process from running:
Bug #1: podman_network.yaml fails when no custom networks exist
File: src/playbooks/backup/tasks/podman_network.yaml
Severity: Blocker — prevents backup from completing
Description: The shell command podman network ls --format '{{.Name}}' | grep -v '^podman$' | while read net; do ... returns exit code 1 when there are no custom networks, because grep -v finds no matching lines.
Fix: Add || true after the grep, or use a different approach:
# Option A: tolerate empty result
failed_when: networks_json.rc not in [0, 1]
# Option B: check first, skip if no custom networksBug #2: Wrong Foreman tasks API endpoint
File: src/playbooks/backup/tasks/preflight.yaml
Severity: High — preflight silently skips running task detection
Description: Uses https://{{ fqdn }}/api/v2/tasks?state=running which returns 404. The correct endpoint is https://{{ fqdn }}/foreman_tasks/api/tasks?state=running&search=state%3Drunning. Because failed_when: false is set, the error is silently ignored.
Impact: Backups will proceed even with running Foreman tasks, risking data inconsistency.
Bug #3: pg_isready and pg_dump not available on host
... I cut the output here, I'm not sure why these commands weren't on my box. It's not related to this PR I don't think.
Bug #4: Hardcoded parameters.yaml path in metadata task
File: src/playbooks/backup/tasks/metadata.yaml
Severity: Low — affects metadata accuracy only
Description: ansible.builtin.slurp reads from /var/lib/foremanctl/parameters.yaml but foremanctl's state directory is configurable via OBSAH_STATE. In dev/vagrant setups, the actual path is different (e.g., /vagrant/.var/lib/foremanctl/parameters.yaml).
Impact: enabled_features: [] in metadata despite features being configured. Doesn't affect DB dump correctness.
Fix: Use the state_dir variable instead of hardcoding the path.
|
Will update the tasks endpoint and parameters.yaml path.. About the podman networks, those are created for IOP.. Like https://github.com/theforeman/foremanctl/blob/master/src/roles/iop_network/tasks/main.yaml so I'd assume we have that present in production deployments. We can add some handling for when it's not. |
|
Maybe nitpick but does it make sense to include the not yet implemented flags when doing |
I am fine either way but it's helpful guidance for future PRs and documentation to look at. |
4da7174 to
10c3e00
Compare
|
not a full review (I stopped somewhere around the secrets backup), but overall this feels a lot like "let's write a huge bash script and then wrap it in YAML" and not like Ansible :( |
|
|
||
| ### Database Integrity | ||
|
|
||
| For internal databases (`--database-mode internal`), the backup runs `amcheck` on all database indexes to verify B-tree consistency before proceeding. |
There was a problem hiding this comment.
Is this relevant to the user or a detail?
There was a problem hiding this comment.
I think it's relevant cause it is one of the steps users see on their playbook execution ouput. I will make it more user facing and less detail-y..
|
|
||
| ## Backup Verification | ||
|
|
||
| After backup completes, verify the backup: |
There was a problem hiding this comment.
This would be nice for the backup command to do if we want users to see this information or perform this procedure.
There was a problem hiding this comment.
Not sure I follow..Like a backup verify command? The playbook does track all backups and provides a summary at the end of execution.
There was a problem hiding this comment.
This could be a future improvement, non-blocking.
| changed_when: true | ||
| no_log: true | ||
|
|
||
| - name: Calculate total backup size |
There was a problem hiding this comment.
Is this task actually doing the calculation or more of a data collection?
There was a problem hiding this comment.
This is gathering file info..Let me name it better.. 👍🏼
| state: directory | ||
| mode: '0755' | ||
|
|
||
| - name: Test write permissions |
There was a problem hiding this comment.
The backup would fail at this step and throw a permissions error..I can catch it and show a better message instead of the default one.
| state: directory | ||
| mode: '0755' | ||
|
|
||
| - name: Run preflight checks |
There was a problem hiding this comment.
Should this be done before any other actions take place? For example, if preflight checks fail, the tasks above would keep creating timestamped directories that are empty.
| until: backup_postgres_status.status.ActiveState == 'inactive' | ||
| retries: "{{ backup_postgresql_stop_retries }}" | ||
| delay: "{{ backup_postgresql_stop_delay }}" | ||
| when: database_mode == 'internal' |
There was a problem hiding this comment.
This is bleeding a non-role variable into the role.
There was a problem hiding this comment.
database_mode is more a deployment config i'd expect to be available for use across roles?
There was a problem hiding this comment.
Passed this as a var in the playbook more explicitly for the role. 👍🏼
| - name: Dump databases | ||
| ansible.builtin.command: | ||
| cmd: > | ||
| pg_dump |
There was a problem hiding this comment.
Was it discussed already why we can not rely on community.postgresql.postgresql_db for these actions?
There was a problem hiding this comment.
Support for incremental dumps is not available which we need eventually, hence we went with pg_dump here.
| until: backup_pg_ready.rc == 0 | ||
| changed_when: false | ||
|
|
||
| - name: Initialize databases_config with Foreman database |
There was a problem hiding this comment.
All of the database config is bleeding non-role variables into the role, and we are getting into duplicated logic territory. For example, we solve this same problem over in https://github.com/theforeman/foremanctl/blob/master/src/playbooks/deploy/deploy.yaml#L16
There should be something we can do here to rely more on the database config that is already centralized.
There was a problem hiding this comment.
Refactored this to build this list from database and database_iop in the playbook pre-task similar to the deploy playbook..
| # - Check for running Pulp tasks | ||
| # - Run amcheck on databases (if available and local) | ||
|
|
||
| - name: Check for running Foreman tasks |
There was a problem hiding this comment.
This should be covered by https://github.com/theforeman/foremanctl/pull/521/changes#diff-51346a793c52cfeb20781f4661d8b25d80d642aeb7d5aa678c40b7b81722696cR22 you may just need to coordinate merge order.
There was a problem hiding this comment.
It's similar but that one is looking for failed tasks and this one's looking for running tasks and polling them till they finish to stop the services for backup..
| - not (wait_for_tasks | default(false)) | ||
| - backup_foreman_running_tasks | int > 0 | ||
|
|
||
| - name: Check for running Pulp tasks |
There was a problem hiding this comment.
Will this be relevant to other commands such as update ?
There was a problem hiding this comment.
I'd say the entire file could be relevant. Waiting for foreman tasks, pulp tasks and maybe running integrity checks on DB..All should be meaningful before an update..
8db6812 to
c7afdef
Compare
|
|
||
| ### Database Integrity | ||
|
|
||
| For internal databases (`--database-mode internal`), the backup verifies that all database indexes are healthy before proceeding. This ensures the backup will be consistent and restorable. |
There was a problem hiding this comment.
if amcheck is available. We should call that out here if the availability of this tool is not guaranteed within the users setup or our baseline requirements.
b066ac9 to
16293d2
Compare
|
|
||
| - name: Wait for PostgreSQL readiness | ||
| ansible.builtin.command: | ||
| cmd: pg_isready -h {{ database_host }} -p {{ database_port }} |
There was a problem hiding this comment.
This will need updating to run inside container and not on the host. Will update to use :
containers.podman.podman_container_exec:
name: "{{ postgresql_container_name }}"
command: pg_isready -h {{ database_host }} -p {{ database_port }}
There was a problem hiding this comment.
We can keep this here and assume postgresql is installed on host and pg_isready and pg_dump are available..I will add a pre_flight check to ensure posgreql package is available on the host.
4ecd05e to
a67e4d0
Compare
Implements comprehensive offline backup functionality for Foreman deployments: - Backs up all databases (foreman, candlepin, pulp, 5 IOP DBs) - Backs up podman secrets, networks, volumes, quadlet files - Backs up systemd units and foremanctl state - Includes metadata with container image digests for restore compatibility - Preflight checks for running tasks and database integrity (amcheck) - Automatic service restoration on failure Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
a67e4d0 to
273fa50
Compare
Implements comprehensive offline backup functionality for Foreman deployments:
Why are you introducing these changes? (Problem description, related links)
What are the changes introduced in this pull request?
How to test this pull request
I got a foremanctl box with normal deploy. On this box, clone foremanctl repo and checkout this branch.
cd /root/foremanctl
source .venv/bin/activate
export OBSAH_STATE=/var/lib/foremanctl
Then try ./foremanctl --help
Also,
The help section has placeholders for incremental, online and --tar-volume-size which will be implemented in follow up cards/PRs.
You can run :
Command:
I have a dummy restore script which can be used for testing and also getting steps to run manually when testing.:
Download the script
wget https://gist.githubusercontent.com/sjha4/35d98b318f15753a678a406fb0fb14ad/raw/test-restore-final.sh
Make it executable
chmod +x test-restore-final.sh
Run restore
./test-restore-final.sh /path/to/backup/foreman-backup-TIMESTAMP
Steps to reproduce:
Checklist