(WIP) [DPE-9685] Release storage on teardown#1827
Draft
marceloneppel wants to merge 5 commits into
Draft
Conversation
The charm implemented no removal hooks, so on unit teardown the charmed-postgresql snap services kept the Juju storage mounts busy. Juju's unmount then failed with "target is busy", leaving storage stuck detaching and blocking machine and model removal (only destroy-model --force could clear it). Stop the workload in the storage-detaching hook, which Juju runs before stop, so the mounts are free by the time Juju unmounts them. This stops every charmed-postgresql snap service plus the charm's topology-observer and log-rotation processes, and is idempotent across the per-storage detaching events. Fixes #1550. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…torage-on-teardown Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The storage-detaching regression only manifests on Juju 4.0: 3.6 masks it
with a cleanup_storage shortcut that removes still-Dying storage, and on
4.0 only rootfs (machine-scoped) storage reproduces the stuck unmount, so
running the teardown check on 3.6 proves nothing. Move it out of
test_storage.py into its own spread task pinned to a juju40 variant with
rootfs storage (force-deployed because the charm still declares
assumes: juju < 4).
Also make the list_storage adapter tolerate Juju 4.0's empty list-storage
output, which it prints instead of "{}" for a model with no storage.
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…torage-on-teardown # Conflicts: # src/cluster.py # tests/unit/test_cluster.py Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Stopping the workload in storage-detaching also fired on scale-down
(remove-unit), where the surviving leader still needs the departing
unit's Patroni reachable to remove it from the raft cluster. Stopping it
early broke that reconfiguration, leaving the cluster unable to elect a
primary ("Primary unit not found") and failing the HA/scaling
integration tests.
The storage unmount only actually hangs on full teardown
(remove-application/destroy-model), so guard the handler on
planned_units() == 0. On scale-down it now does nothing, restoring the
pre-fix cluster behaviour, while destroy-model/remove-application still
release the storage.
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP: we still need to consider force re-attaching/detaching of storage.
Issue
Solution
Checklist
Fixes #1550.