Skip to content

feat: Differential backup implementation#243

Merged
adskyiproger merged 6 commits intodevelopfrom
ocrvs-11767
Feb 26, 2026
Merged

feat: Differential backup implementation#243
adskyiproger merged 6 commits intodevelopfrom
ocrvs-11767

Conversation

@adskyiproger
Copy link
Collaborator

@adskyiproger adskyiproger commented Feb 13, 2026

Description

Terminology, see backup definition at: https://pgbackrest.org/user-guide.html#concept/backup

  • Full Backup: pgBackRest copies the entire contents of the database cluster to the backup. The first backup of the database cluster is always a Full Backup. pgBackRest is always able to restore a full backup directly. The full backup does not depend on any files outside of the full backup for consistency.
  • Differential Backup: pgBackRest copies only those database cluster files that have changed since the last full backup. pgBackRest restores a differential backup by copying all of the files in the chosen differential backup and the appropriate unchanged files from the previous full backup. The advantage of a differential backup is that it requires less disk space than a full backup, however, the differential backup and the full backup must both be valid to restore the differential backup.
  • Incremental Backup: pgBackRest copies only those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup). As an incremental backup only includes those files changed since the prior backup, they are generally much smaller than full or differential backups. As with the differential backup, the incremental backup depends on other backups to be valid to restore the incremental backup. Since the incremental backup includes only those files since the last backup, all prior incremental backups back to the prior differential, the prior differential backup, and the prior full backup must all be valid to perform a restore of the incremental backup. If no differential backup exists then all prior incremental backups back to the prior full backup, which must exist, and the full backup itself must be valid to restore the incremental backup.

DON'T be confused with DUMP and FULL backup: https://www.postgresql.org/docs/current/backup-dump.html.
Dump backup is particular database backup, not instance (cluster) backup.

Considerations for partial backup/restore:

  1. Same username and password for production and staging databases: Partial (differential or incremental) database backup can be performed at postgres instance (cluster) level, that means we can't backup individual databases, roles and users. Same statement is true about restore. After restore all database users and roles from production will be restored on staging database. Staging OpenCRVS users (events and analytics) can be recovered later by re-run data-migration and data-migration-analytics jobs to add necessary permissions. Recovering staging admin user password requires to know production password, making password the same will simplify database management.
  2. Using right backup type: Default backup type will be well tested dump. Full events database dump will be taken by default. Operator will have an option to change backup type from dump to differential by running environment:init script or by changing value in GitHub environment variable. incremental backup strategy is not as straight forward as differential and gives more space for human-error, e/g incremental backups should be specified in right order while restore. incremental are a bit beneficial in terms of disk space usage, since only diff between previous incremental backup is taken, but since CRVS records are growing linearly, there is no need to take care about this kind of optimisation. We prefer differential over incremental.

Testing

environment:init script

Configuration question
image

Summary screen

image

Workflow execution

Get postgres admin password from production environment:
image

Secret is stored in $env_file and later transferred to k8s secret postgres-admin-user:
image

Helm deploy to production

Before running deployment modify values.yaml:

helm get values -n opencrvs-deps-production opencrvs-deps  > opencrvs-deps.yaml

Add following lines:

postgres:
  backup:
    type: pgbackrest

Deploy modified configuration:

helm upgrade opencrvs-deps -f opencrvs-deps.yaml ../infrastructure/charts/dependencies/ 

Verify cronjob was created:

k get cronjob
NAME                   SCHEDULE    TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
...
postgres-backup-diff   0 1 * * *   <none>     False     0        <none>          39m
postgres-backup-full   0 1 * * *   <none>     False     0        <none>          39m

Backup testing

Create job to test full backup is working properly:

k create job --from cronjob/postgres-backup-full postgres-backup-full1

Create job to test diff backup is working properly:

k create job --from cronjob/postgres-backup-diff postgres-backup-diff1

Verify jobs were created:

k get job
NAME                       STATUS     COMPLETIONS   DURATION   AGE
...
postgres-backup-diff1      Complete   1/1           8s         11m
postgres-backup-full1      Complete   1/1           35s        14m

Verify backups were created:

  1. Connect to postgres pod: k exec -it postgres-0 -- bash
  2. Check backup repository status: pgbackrest --stanza=main info
NB01NSTL012:opencrvs-infrastructure vmudryi$ k exec -it postgres-0 -- bash
root@postgres-0:/# pgbackrest --stanza=main info
stanza: main
    status: ok
    cipher: aes-256-cbc

    db (current)
        wal archive min/max (17): 000000010000000000000012/00000001000000000000001B

        full backup: 20260217-160745F
            timestamp start/stop: 2026-02-17 16:07:45+00 / 2026-02-17 16:08:13+00
            wal start/stop: 000000010000000000000012 / 000000010000000000000012
            database size: 37.5MB, database backup size: 37.5MB
            repo1: backup set size: 4.9MB, backup size: 4.9MB

        diff backup: 20260217-160745F_20260217-161041D
            timestamp start/stop: 2026-02-17 16:10:41+00 / 2026-02-17 16:10:43+00
            wal start/stop: 000000010000000000000014 / 000000010000000000000014
            database size: 37.5MB, database backup size: 8.3KB
            repo1: backup set size: 4.9MB, backup size: 464B
            backup reference total: 1 full

Deploy to staging

Before running deployment modify values.yaml:

helm get values -n opencrvs-deps-stg opencrvs-deps  > opencrvs-deps-stg.yaml

Add following lines:

postgres:
  restore:
    type: pgbackrest

Deploy modified configuration:

helm upgrade opencrvs-deps -f opencrvs-deps-stg.yaml ../infrastructure/charts/dependencies/ 

Verify restore cronjob was created:

k get cronjob
NAME               SCHEDULE    TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
...
postgres-restore   0 0 * * *   <none>     False     0        9h              5d14h

NOTE: Cronjob name for both restore types (dump, pgbackrest) are the same.

Restore testing

Wait up to 24h or create restore job manually:

k create job --from cronjob/postgres-restore postgres-restore1

Output:

job.batch/postgres-restore1 created

Check logs:

k logs job/postgres-restore1

Output:

pod "postgres-restore-runner" deleted from opencrvs-deps-stg namespace
pod/postgres-restore-runner created
statefulset.apps/postgres scaled
pod/postgres-restore-runner condition met
Waiting for container initialization
Waiting for container initialization
pgBackRest 2.58.0
2026-02-18 09:08:09.095 P00   INFO: restore command begin 2.58.0: --config=/etc/pgbackrest/pgbackrest.conf --delta --exec-id=1916-9717daa8 --force --log-level-console=info --pg1-path=/var/lib/postgresql/data --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-host=10.2.0.3 --repo1-host-user=backup --repo1-path=/home/backup/production/postgres --repo1-type=posix --stanza=main
2026-02-18 09:08:10.153 P00   INFO: repo1: restore backup set 20260218-010007F_20260218-010028D, recovery will start at 2026-02-18 01:00:28
2026-02-18 09:08:10.159 P00   INFO: remove invalid files/links/paths from '/var/lib/postgresql/data'
2026-02-18 09:08:12.581 P00   INFO: write updated /var/lib/postgresql/data/postgresql.auto.conf
2026-02-18 09:08:12.598 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2026-02-18 09:08:12.602 P00   INFO: restore size = 37.7MB, file total = 1612
2026-02-18 09:08:12.603 P00   INFO: restore command end: completed successfully (3511ms)
statefulset.apps/postgres scaled

Check data consistency:
image

OpenCRVS on restore cronjob

Manual verification steps

Create values file:

helm get values opencrvs -n opencrvs-stg > opencrvs.yaml

Update values file:

on_restore_cronjob:
  enabled: true

Upgrade helm release:

helm upgrade -f opencrvs.yaml opencrvs ../infrastructure/charts/opencrvs-services/

Create job from cronjob:

k create job --from cronjob/on-db-restore-cronjob on-db-restore-cronjob1

Check logs:

k logs on-db-restore-cronjob1-9mt4d --all-containers

Logs:

k logs on-db-restore-cronjob1-9mt4d --all-containers
Waiting for PostgreSQL to be ready at postgres-0.postgres.opencrvs-deps-stg.svc.cluster.local:5432...
Checking if database 'events' exists...
[1/3] Cluster-wide setup...
✅ Database 'events' already exists.
Creating or updating role 'events_migrator' with access to database 'events'...
DO
Creating or updating role 'events_app' with access to database 'events'...
DO
Creating or updating role 'analytics' with access to database 'events'...
DO
Checking if schema app in DB 'events' exists...
[2/3] Database-specific setup...
✅ Schema 'app' already exists in database 'events'. Skipping DB-specific setup.
[3/3] Schema-specific setup...
GRANT
GRANT
ALTER DEFAULT PRIVILEGES
✅ PostgreSQL setup completed successfully.
( 1/12) Installing brotli-libs (1.2.0-r0)
( 2/12) Installing c-ares (1.34.6-r0)
( 3/12) Installing libunistring (1.4.1-r0)
( 4/12) Installing libidn2 (2.3.8-r0)
( 5/12) Installing nghttp2-libs (1.68.0-r0)
( 6/12) Installing nghttp3 (1.13.1-r0)
( 7/12) Installing libpsl (0.21.5-r3)
( 8/12) Installing zstd-libs (1.5.7-r2)
( 9/12) Installing libcurl (8.17.0-r1)
(10/12) Installing curl (8.17.0-r1)
(11/12) Installing oniguruma (6.9.10-r0)
(12/12) Installing jq (1.8.1-r0)
Executing busybox-1.37.0-r30.trigger
OK: 14.1 MiB in 28 packages
Reindexing search...

...done reindexing

@adskyiproger adskyiproger force-pushed the ocrvs-11767 branch 7 times, most recently from eac23c0 to be3d34b Compare February 18, 2026 09:59
@adskyiproger adskyiproger changed the title feat: Differential backup implementation feat: Differential backup implementation for postgres Feb 18, 2026
@adskyiproger adskyiproger force-pushed the ocrvs-11767 branch 2 times, most recently from 4f7b47d to 57cf7aa Compare February 18, 2026 12:08
@adskyiproger adskyiproger changed the title feat: Differential backup implementation for postgres feat: Differential backup implementation Feb 18, 2026
@adskyiproger
Copy link
Collaborator Author

Please check this one as well @oni-on1003

@adskyiproger adskyiproger force-pushed the ocrvs-11767 branch 2 times, most recently from 227d7cf to c3ccf31 Compare February 26, 2026 09:07
@adskyiproger adskyiproger merged commit b81581b into develop Feb 26, 2026
@adskyiproger adskyiproger deleted the ocrvs-11767 branch February 26, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant