Fix: dataset deletion cleanup gaps#18
Closed
AlexianMasson wants to merge 15 commits into
Closed
Conversation
76fbbbc to
c8f3ef1
Compare
Dataset deletion removed the feature type but left the layer's .r/.w ACL rules in GeoServer. Since the final table name becomes reusable once the row is deleted, a stale rule could silently apply to a future dataset published under the same name. Add GeoServerService.delete_layer_acl (best-effort, 404-tolerant) and call it from DatasetDeletionService next to delete_layer.
The generator gained a normalize_nan import that the stub never provided, so the test module failed at collection.
In refresh mode (scheduled re-ingestion), clean_staging_table_task ran only on transform success, so every failed scheduled run leaked an untracked temp_<hex> table in the staging schema that dataset deletion could never clean up. Give the task group a clean_on_failure flag that switches the cleanup task to TriggerRule.ALL_DONE, enabled for transform_refresh only. Direct mode keeps ALL_SUCCESS so the user's tracked staging table survives a failed transform (it is dropped on dataset deletion). In direct-mode runs the refresh cleanup task now executes but exits early via its existing XCom-None guard, and a succeeding ALL_DONE leaf does not flip a failed DAG run to success, so failure callbacks still fire.
DAG deletion was guarded by the schedule still being set. When the schedule had been cleared before the dataset was deleted, the stale ingestion_<id> DAG (metadata + run history) stayed in Airflow forever. Call delete_dag unconditionally — it already treats 404 as success.
Clearing a recurrence schedule (DELETE/PATCH /schedule, or re-processing without recurrence) only cancelled runs / updated the DB. The DAG generator then stopped emitting ingestion_<id>, leaving stale DAG metadata and run history in Airflow. Add remove_ingestion_dag (cancel runs + delete DAG, best-effort) and call it from every code path that clears an existing schedule.
The temp upload file (FILE imports) was only removed in the staging success callback. On staging failure the IntegrityLink row was deleted but the file stayed in TMP_UPLOAD_PATH with nothing referencing it anymore.
A staging_dag or process_dag run still in flight when the dataset was deleted would recreate the staging/final table after cleanup (its callbacks 404, leaving an orphaned table with no GS/GN artifacts). Force-fail all runs of the dataset first (best-effort): the scheduled ingestion DAG, process_dag runs (prefix '<id>_') and staging_dag runs (prefix '<id>' — the first staging run id is exactly the uuid).
staging_dag/process_dag run records (with task instances and XComs
holding staging/final table names) outlived the dataset forever.
Add purge_dataset_dag_runs, matching run ids by SQL LIKE prefix
('<id>%' for staging_dag, '<id>_%' for process_dag), called best-effort
before the IntegrityLink row is deleted.
Known limitation: failed-run log files under /opt/airflow/logs live on
the Airflow volume and cannot be reached from the backend; success-run
logs are already removed by the DAG success callback.
The per-org GeoServer workspace ({org}) and datastore ({org}_ds) and
the org PostgreSQL schema survived forever, even after the org's last
dataset was deleted.
When no IntegrityLink remains for the organization, delete the
datastore and workspace via raw non-recursive REST calls (GeoServer
refuses to delete non-empty resources, so anything still in use
survives; geoservercloud's own delete helpers hardcode recurse=true and
are deliberately avoided) and DROP SCHEMA ... RESTRICT the org schema.
The shared 'staging' and default 'data' schemas are never touched.
Run-id formats were built and matched with hand-written literals at seven sites across staging, process, log routes and the airflow client, with nothing keeping them in sync. One module now owns the make/match helpers; the elt generator (which cannot import backend code) gets a cross-reference comment. No behavior change.
_force_fail_dag_runs fetched a single unfiltered page (server default 100) of the shared process_dag/staging_dag history and filtered client-side: once a shared DAG accumulates more than a page of runs, in-flight runs of the target dataset can sit past page 1 and silently survive cancellation — defeating the cancel-before-delete protection. Both helpers now share one paginated iterator (_for_each_dag_run); force-fail passes state=[running,queued] and the run-id LIKE pattern so the server returns only relevant runs, and re-queries until none remain. The literal-prefix re-check guards against LIKE '_' wildcard overmatch.
Clearing a recurrence cancelled every in-flight run of the dataset — including manual process runs and staging runs (which are always user-initiated) — when the user only meant to stop scheduled ingestions. Split the cancel: - cancel_dataset_runs: everything (ingestion + process + staging), used by dataset deletion where no run may keep writing to the tables - cancel_scheduled_runs: ingestion runs + non-'_manual' process runs only, used by remove_ingestion_dag on schedule-clear
Clearing a recurrence was implemented three times (DELETE /schedule, update_schedule with a had_schedule flag, process_staging_data) with divergent guards, each having to remember the remove_ingestion_dag call. One helper now pairs the field reset with the DAG removal; a future caller cannot forget it.
The FILE-only cleanup block was byte-for-byte duplicated between the success and failure callbacks; the eligibility predicate already drifted once, so keep it in one place.
The never-drop guard hardcoded the 'staging' literal inside the deletion service; when get_staging_schema becomes configurable the guard would silently stop covering it. is_shared_schema now derives from get_staging_schema/DEFAULT_DATA_SCHEMA in core.config.
c8f3ef1 to
0e7ce13
Compare
Contributor
|
Not tested everything, but seems to globally work fine. Code review in splitted PRs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR is too big. Splitted in 4 :