Skip to content

Deadline never fires after non-deadline DAG edit due to orphaned deadline_alert #68732

@katherine-wong7

Description

@katherine-wong7

Under which category would you file this issue?

Airflow Core

Apache Airflow version

3.2.2

What happened and how to reproduce it?

Issue Description

When a DAG with a DeadlineAlert is re-serialized due to a non-deadline change, the new serialized_dag row is inserted with a deadline UUID in its data["dag"]["deadline"] JSON, but no corresponding deadline_alert row gets created. The existing deadline_alert row still points at the OLD serialized_dag.id.

When a dagrun is created against the new version, the materialization query joins on deadline_alert.serialized_dag_id = <new sd id> and returns zero rows. No deadline row is materialized. The deadline never fires, the callback never runs, and the dagrun completes normally with no log or error indicating the miss.

The bug fires whenever _try_reuse_deadline_uuids returns a non-None mapping (i.e. the deadline definition itself didn't change) AND write_dag proceeds past the hash-equal short-circuit (i.e. some non-deadline part of the DAG did change). In practice this affects most deadline-bearing DAGs after their first non-deadline edit.

How to reproduce

A failing test on apache/airflow:main HEAD 9c4908019a:

https://github.com/katherine-wong7/airflow/blob/kwong/deadline-orphan-test/airflow-core/tests/unit/models/test_serialized_dag.py#L1022-L1078

Fails with:

AssertionError: serialized_dag has deadline UUID [] in JSON
but no deadline_alert row links to it

The test creates a deadline-bearing DAG, runs it once (so the dag_version has
task instances, forcing the next write into the INSERT branch), adds a second
task without touching the deadline definition, re-writes, and asserts the new
serialized_dag has a linked deadline_alert. The assertion fails on current
main.

Code path

The buggy assignment is at airflow-core/src/airflow/models/serialized_dag.py:663:

if reuse_result is not None:
    deadline_uuid_mapping, name_updates = reuse_result
    ...
    dag.data["dag"]["deadline"] = existing_deadline_uuids
    deadline_uuid_mapping = {}                          # <-- empty mapping

When the DAG hash also differs (non-deadline change occurred), write_dag proceeds to either the dynamic-DAG UPDATE branch (line 754) or the new-row INSERT branch (line 777). Both call _create_deadline_alert_records(serialized_dag, {}). That function early-returns on empty mapping, so no deadline_alert row is created or linked. PR #61702 introduced this optimization.

What you think should happen instead?

Every serialized_dag row whose data["dag"]["deadline"] JSON contains a deadline UUID should have a corresponding deadline_alert row whose serialized_dag_id points at that serialized_dag.id. Otherwise the materialization query at dagrun creation fails to find the alert and the deadline silently never fires.

Operating System

macOS 15.7.7

Deployment

Other Docker-based deployment

Apache Airflow Provider(s)

No response

Versions of Apache Airflow Providers

No response

Official Helm Chart version

Not Applicable

Kubernetes Version

No response

Helm Chart configuration

No response

Docker Image customizations

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions