Under which category would you file this issue?
Task SDK
Apache Airflow version
3.2.0. & 3.2.1
What happened and how to reproduce it?
Issue Description
The observed behavior starts after upgrading to Airflow 3.2.0 (and persists in 3.2.1) and disappears when downgrading to 3.1.8.
The failure manifests on Celery workers during the post-execution "finish" step (Task SDK / execution API), where the worker attempts to PATCH the TaskInstance state and receives:
- HTTP 409
reason=invalid_state
message=TI was not in the running state so it cannot be updated
previous_state=success
This results in a worker-side exception (ServerResponseError) and causes the Celery workload to be treated as failed even when the underlying task logic has completed.
- Airflow version(s) affected: 3.2.0, 3.2.1
- Airflow version known-good: 3.1.8
- Executor: CeleryExecutor
- Deployment: Kubernetes
- Python: 3.12
- Database: Postgres
Typical worker log sequence:
- Worker starts and loads secrets backend
- Worker begins Task SDK API calls (often with retries)
- API returns 409 invalid_state with
previous_state=success
- Supervisor raises
ServerResponseError and the Celery task fails
Example:
Secrets backends loaded for worker ...
Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
Starting call to ... this is the 2nd time calling it.
API server error ... status_code=409 ... previous_state='success'
airflow.sdk.api.client.ServerResponseError: Server returned error
Steps to reproduce
The issue appears intermittently; higher concurrency tends to increase the frequency.
Prerequisites
- Airflow 3.2.0+
- CeleryExecutor
- Execution API enabled (Task SDK in use)
- Deploy Airflow 3.2.0 or 3.2.1.
- Trigger the DAG multiple times (or run multiple manual runs concurrently).
- Observe worker logs for
ServerResponseError with HTTP 409 invalid_state and previous_state=success.
What you think should happen instead?
Airflow workers should be able to finalise a task idempotently; if the API server has already moved the TaskInstance to a terminal state (e.g. success), the worker’s PATCH /execution/task-instances//state should not return a fatal 409 or fail the task.
Operating System
No response
Deployment
Official Apache Airflow Helm Chart
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Under which category would you file this issue?
Task SDK
Apache Airflow version
3.2.0. & 3.2.1
What happened and how to reproduce it?
Issue Description
The observed behavior starts after upgrading to Airflow 3.2.0 (and persists in 3.2.1) and disappears when downgrading to 3.1.8.
The failure manifests on Celery workers during the post-execution "finish" step (Task SDK / execution API), where the worker attempts to PATCH the TaskInstance state and receives:
reason=invalid_statemessage=TI was not in the running state so it cannot be updatedprevious_state=successThis results in a worker-side exception (
ServerResponseError) and causes the Celery workload to be treated as failed even when the underlying task logic has completed.Typical worker log sequence:
previous_state=successServerResponseErrorand the Celery task failsExample:
Secrets backends loaded for worker ...Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it.Starting call to ... this is the 2nd time calling it.API server error ... status_code=409 ... previous_state='success'airflow.sdk.api.client.ServerResponseError: Server returned errorSteps to reproduce
The issue appears intermittently; higher concurrency tends to increase the frequency.
Prerequisites
ServerResponseErrorwith HTTP 409invalid_stateandprevious_state=success.What you think should happen instead?
Airflow workers should be able to finalise a task idempotently; if the API server has already moved the TaskInstance to a terminal state (e.g. success), the worker’s PATCH /execution/task-instances//state should not return a fatal 409 or fail the task.
Operating System
No response
Deployment
Official Apache Airflow Helm Chart
Apache Airflow Provider(s)
No response
Versions of Apache Airflow Providers
No response
Official Helm Chart version
Not Applicable
Kubernetes Version
No response
Helm Chart configuration
No response
Docker Image customizations
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct