Skip to content

Fix memory leak in LocalExecutor caused by unreleased file descriptor locks#65121

Open
wjddn279 wants to merge 3 commits intoapache:mainfrom
wjddn279:fix-memory-leak-in-log_file_descriptor
Open

Fix memory leak in LocalExecutor caused by unreleased file descriptor locks#65121
wjddn279 wants to merge 3 commits intoapache:mainfrom
wjddn279:fix-memory-leak-in-log_file_descriptor

Conversation

@wjddn279
Copy link
Copy Markdown
Contributor

@wjddn279 wjddn279 commented Apr 13, 2026

Problem?

We conducted an investigation to resolve the continuous memory growth in the Local Executor's forked processes, following the previous memory spike fix.

Using memray to profile the Local Executor's forked processes, we confirmed that memory was steadily increasing. Specifically, logging-related memory was growing, and this was traced to the process of writing to local files.
memray-flamegraph-output-2026-04-11 13:17:18.516888.html

Cause?

Looking at, full_path.touch(new_file_permissions) conf.get("logging", "file_task_handler_new_file_permissions", fallback="0o664") in flamegraph, I noticed that basic string objects were not being properly garbage collected, which led us to suspect unreleased references.

Upon examining self._lock = _get_lock_for_file(self._file) in the memory-increasing area, I found that file descriptors were being cached as locks in a dictionary with no corresponding release mechanism. This prevented the file descriptors from being properly released, and consequently, the parent Process objects also retained references and were never garbage collected.

Solution

After adding code to manually release these references, we re-ran the profiling and confirmed that the related objects were being properly freed.
memray-flamegraph-output-2026-04-12 13:14:12.556546.html

PS

The remained parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.']issue will be addressed in a separate PR.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch from 6f0717f to ad37cd0 Compare April 13, 2026 08:17
@eladkal eladkal added this to the Airflow 3.2.1 milestone Apr 13, 2026
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Apr 13, 2026
Comment thread task-sdk/src/airflow/sdk/execution_time/supervisor.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/supervisor.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/supervisor.py Outdated
@potiuk potiuk marked this pull request as draft April 23, 2026 00:43
@potiuk

This comment was marked as outdated.

@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch from ad37cd0 to 5a533f4 Compare April 23, 2026 04:18
@wjddn279 wjddn279 marked this pull request as ready for review April 23, 2026 04:26
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 27, 2026

@wjddn279 — There are 1 unresolved review thread on this PR from @eladkal. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks!


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch 2 times, most recently from cb82a2e to 2215c99 Compare April 30, 2026 06:23
@wjddn279 wjddn279 requested a review from potiuk as a code owner April 30, 2026 06:23
@ashb ashb changed the title Fix memory leak in forked worker caused by unreleased file descriptor locks Fix memory leak in LocalExecutor caused by unreleased file descriptor locks Apr 30, 2026
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 30, 2026
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 30, 2026

@wjddn279 Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

This behaviour was wrong and it's been fixed in #65916

Sorry for the noise @wjddn279 !

@uranusjr
Copy link
Copy Markdown
Member

Weird we’d need to do this. Can this be considered a bug in structlog?

@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented May 1, 2026

I'm currently checking with the structlog maintainers, but I'm not sure whether I'll get a response

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 1, 2026

I'm currently checking with the structlog maintainers, but I'm not sure whether I'll get a response

Can you share the issue opened reporting the bug?

@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented May 1, 2026

@eladkal

not issue but pr thread i've opended

hynek/structlog#806 (comment)

@ashb
Copy link
Copy Markdown
Member

ashb commented May 1, 2026

I think I've got a better fix hynek/structlog#807

@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented May 3, 2026

@ashb

The PR has been merged! I checked and the release schedule including that version doesn't seem to be confirmed yet. How about we use this PR for now and remove it later when we bump the version after the release?

@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch 2 times, most recently from 0348613 to 30aaf78 Compare May 6, 2026 09:06
@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 6, 2026

The PR has been merged! I checked and the release schedule including that version doesn't seem to be confirmed yet. How about we use this PR for now and remove it later when we bump the version after the release?

Generally speaking, we can backport/vendor the fix right? and in parallal have a draft PR that removes the backport with updating the libary version that we will merge when it's released.
but lets see what @ashb think about that.

@ashb
Copy link
Copy Markdown
Member

ashb commented May 6, 2026

Hynek is often prompt about releasing new versions, but if we don't want to wait, then backporting this is a one line change -- as long as we apply it "safely" (so the it doesn't crash if something goes wrong, sure.

@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch from 30aaf78 to b27db45 Compare May 7, 2026 00:06
@wjddn279 wjddn279 force-pushed the fix-memory-leak-in-log_file_descriptor branch from b27db45 to 0a08921 Compare May 7, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk ready for maintainer review Set after triaging when all criteria pass. type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants