Skip to content

(WIP) [DPE-9685] Release storage on teardown#1827

Draft
marceloneppel wants to merge 5 commits into
16/edgefrom
fix/1550-release-storage-on-teardown
Draft

(WIP) [DPE-9685] Release storage on teardown#1827
marceloneppel wants to merge 5 commits into
16/edgefrom
fix/1550-release-storage-on-teardown

Conversation

@marceloneppel

@marceloneppel marceloneppel commented Jul 2, 2026

Copy link
Copy Markdown
Member

WIP: we still need to consider force re-attaching/detaching of storage.

Issue

Solution

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

Fixes #1550.

The charm implemented no removal hooks, so on unit teardown the
charmed-postgresql snap services kept the Juju storage mounts busy.
Juju's unmount then failed with "target is busy", leaving storage
stuck detaching and blocking machine and model removal (only
destroy-model --force could clear it).

Stop the workload in the storage-detaching hook, which Juju runs
before stop, so the mounts are free by the time Juju unmounts them.
This stops every charmed-postgresql snap service plus the charm's
topology-observer and log-rotation processes, and is idempotent
across the per-storage detaching events.

Fixes #1550.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…torage-on-teardown

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel added the bug Something isn't working as expected label Jul 2, 2026
@github-actions github-actions Bot added the Libraries: Out of sync The charm libs used are out-of-sync label Jul 2, 2026
@marceloneppel marceloneppel changed the title Fix/1550 release storage on teardown (WIP) Fix/1550 release storage on teardown Jul 2, 2026
@marceloneppel marceloneppel changed the title (WIP) Fix/1550 release storage on teardown (WIP) [DPE-9685] Release storage on teardown Jul 2, 2026
The storage-detaching regression only manifests on Juju 4.0: 3.6 masks it
with a cleanup_storage shortcut that removes still-Dying storage, and on
4.0 only rootfs (machine-scoped) storage reproduces the stuck unmount, so
running the teardown check on 3.6 proves nothing. Move it out of
test_storage.py into its own spread task pinned to a juju40 variant with
rootfs storage (force-deployed because the charm still declares
assumes: juju < 4).

Also make the list_storage adapter tolerate Juju 4.0's empty list-storage
output, which it prints instead of "{}" for a model with no storage.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…torage-on-teardown

# Conflicts:
#	src/cluster.py
#	tests/unit/test_cluster.py

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Stopping the workload in storage-detaching also fired on scale-down
(remove-unit), where the surviving leader still needs the departing
unit's Patroni reachable to remove it from the raft cluster. Stopping it
early broke that reconfiguration, leaving the cluster unable to elect a
primary ("Primary unit not found") and failing the HA/scaling
integration tests.

The storage unmount only actually hangs on full teardown
(remove-application/destroy-model), so guard the handler on
planned_units() == 0. On scale-down it now does nothing, restoring the
pre-fix cluster behaviour, while destroy-model/remove-application still
release the storage.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant