Skip to content

Fix memory growth from pathlib sys.intern in long-running processes#65706

Merged
potiuk merged 3 commits intoapache:mainfrom
wjddn279:fix-memory-leak-in-python-path-module
Apr 25, 2026
Merged

Fix memory growth from pathlib sys.intern in long-running processes#65706
potiuk merged 3 commits intoapache:mainfrom
wjddn279:fix-memory-leak-in-python-path-module

Conversation

@wjddn279
Copy link
Copy Markdown
Contributor

@wjddn279 wjddn279 commented Apr 23, 2026

Problem?

After patching for #65121, I found the memory growth in parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.'] statement in python Path library.
memray-flamegraph-output-2026-04-12.13.14.12.556546.html

Cause?

pathlib.PurePath._parse_path calls sys.intern on every path component. In long-running Airflow processes (scheduler, dag-processor, triggerer, workers) each task run produces log paths containing unique identifiers (dag_id, run_id, task_id, try_number), and those interned components accumulate in the interpreter's intern dict for the lifetime of the process.

Solution?

The exactly same issue was reported in python issue (python/cpython#119518). Python 3.14 removes the interning call from _parse_path entirely (python/cpython#123356), but Airflow supports earlier versions. This PR backports the 3.14 parsing logic as a small Path subclass (_PatchedPath) and routes log-folder/log-file initialization through it, so path components can be garbage-collected as ordinary mortal strings.

After applying this change and #65121, I confirmed that all memory leaks under the local executor are gone. This improvement should also carry over to other components that rely on logging (e.g. dag-processor, celery executor).
memray-flamegraph-output-2026-04-17 08:12:02.069120.html


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@wjddn279 wjddn279 marked this pull request as draft April 23, 2026 06:20
@wjddn279 wjddn279 marked this pull request as ready for review April 24, 2026 01:37
@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-python-path-module branch from c4677f0 to 02e4ed3 Compare April 24, 2026 04:18
@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-python-path-module branch from 02e4ed3 to 26eb945 Compare April 24, 2026 05:50
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 25, 2026

This is COOL. Thanks @wjddn279 !

@potiuk potiuk added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 25, 2026
@potiuk potiuk merged commit d14862c into apache:main Apr 25, 2026
90 checks passed
@github-actions github-actions Bot added this to the Airflow 3.2.2 milestone Apr 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi maintainer, this PR was merged without a milestone set.
We've automatically set the milestone to Airflow 3.2.2 based on: backport label targeting v3-2-test
If this milestone is not correct, please update it to the appropriate milestone.

This comment was generated by Milestone Tag Assistant.

@github-actions
Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

github-actions Bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Apr 25, 2026
… processes (apache#65706)

* Fix memory growth from pathlib sys.intern in long-running processes

* fix supporting python version

* fix logic
(cherry picked from commit d14862c)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
aws-airflow-bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Apr 25, 2026
… processes (apache#65706)

* Fix memory growth from pathlib sys.intern in long-running processes

* fix supporting python version

* fix logic
(cherry picked from commit d14862c)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
potiuk pushed a commit that referenced this pull request Apr 26, 2026
… processes (#65706)

* Fix memory growth from pathlib sys.intern in long-running processes

* fix supporting python version

* fix logic
(cherry picked from commit d14862c)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
potiuk pushed a commit that referenced this pull request Apr 26, 2026
… processes (#65706) (#65855)

* Fix memory growth from pathlib sys.intern in long-running processes

* fix supporting python version

* fix logic
(cherry picked from commit d14862c)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 27, 2026
… processes (#65706) (#65855)

* Fix memory growth from pathlib sys.intern in long-running processes

* fix supporting python version

* fix logic
(cherry picked from commit d14862c)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants