Summary
When pgbackrest fails to initialize the S3 backup stanza (e.g. due to a transient S3 auth issue, network blip, or credential rotation), the charm correctly sets its status to blocked. However, once the underlying issue resolves and the stanza is healthy again, the charm never clears the blocked status on its own. The unit stays blocked indefinitely, even though pgbackrest info returns status: ok and backups are completing successfully.
This has happened to us repeatedly across both STG and PROD environments. Each time it requires direct operator intervention.
Observed behaviour
postgresql/37* blocked idle ... 10.152.88.119 5432/tcp failed to initialize stanza, check your S3 settings
Running the pgbackrest check directly on the unit shows the stanza is fine:
sudo -u _daemon_ env LD_LIBRARY_PATH=... /snap/charmed-postgresql/current/usr/bin/pgbackrest \
--config=/var/snap/charmed-postgresql/common/etc/pgbackrest/pgbackrest.conf \
--stanza=prod-landscape-saas-ps7.postgresql \
info
# → status: ok
The stanza-create log is empty (no failed run after recovery). The charm simply never re-checks.
Expected behaviour
The update-status hook (or a dedicated periodic check) should re-run the stanza health check and clear the blocked status once the stanza is ok. The charm should be able to self-heal without operator intervention.
Steps to reproduce
- Deploy
charmed-postgresql with a pgbackrest S3 backend
- Temporarily break S3 access (rotate creds, revoke IAM permissions, simulate a network partition, etc.)
- Observe charm transitions to
blocked: failed to initialize stanza
- Restore S3 access — stanza becomes healthy (
pgbackrest info returns status: ok, backups run successfully)
- Observe charm status remains
blocked indefinitely
Impact
- Silent loss of backup monitoring: operators assume backups are broken when they are not
- Requires manual intervention every time a transient S3 issue occurs (not acceptable for production)
- We have hit this in both STG and PROD on back-to-back days; it appears to be the normal failure mode for any S3 disruption
Environment
- Charm:
charmed-postgresql (machine operator, not k8s)
- Juju controller: JAAS / Prodstack7
- Backend: Ceph RadosGW (S3-compatible)
- PostgreSQL: Patroni-managed cluster, 3-unit STG / 2-unit PROD
Suggested fix
In the update-status hook: if the unit is currently blocked with a stanza-related message, re-run the stanza check. If pgbackrest info returns status: ok, clear the blocked status and set active. This is a low-risk read-only check that should run on every update-status interval.
Summary
When pgbackrest fails to initialize the S3 backup stanza (e.g. due to a transient S3 auth issue, network blip, or credential rotation), the charm correctly sets its status to
blocked. However, once the underlying issue resolves and the stanza is healthy again, the charm never clears the blocked status on its own. The unit staysblockedindefinitely, even thoughpgbackrest inforeturnsstatus: okand backups are completing successfully.This has happened to us repeatedly across both STG and PROD environments. Each time it requires direct operator intervention.
Observed behaviour
Running the pgbackrest check directly on the unit shows the stanza is fine:
sudo -u _daemon_ env LD_LIBRARY_PATH=... /snap/charmed-postgresql/current/usr/bin/pgbackrest \ --config=/var/snap/charmed-postgresql/common/etc/pgbackrest/pgbackrest.conf \ --stanza=prod-landscape-saas-ps7.postgresql \ info # → status: okThe
stanza-createlog is empty (no failed run after recovery). The charm simply never re-checks.Expected behaviour
The
update-statushook (or a dedicated periodic check) should re-run the stanza health check and clear theblockedstatus once the stanza isok. The charm should be able to self-heal without operator intervention.Steps to reproduce
charmed-postgresqlwith a pgbackrest S3 backendblocked: failed to initialize stanzapgbackrest inforeturnsstatus: ok, backups run successfully)blockedindefinitelyImpact
Environment
charmed-postgresql(machine operator, not k8s)Suggested fix
In the
update-statushook: if the unit is currentlyblockedwith a stanza-related message, re-run the stanza check. Ifpgbackrest inforeturnsstatus: ok, clear the blocked status and setactive. This is a low-risk read-only check that should run on everyupdate-statusinterval.