Skip to content

Expose remove_file_from_lock helper for WRITE_LOCKS cleanup#806

Closed
wjddn279 wants to merge 2 commits intohynek:mainfrom
wjddn279:add-remove-file-from-lock
Closed

Expose remove_file_from_lock helper for WRITE_LOCKS cleanup#806
wjddn279 wants to merge 2 commits intohynek:mainfrom
wjddn279:add-remove-file-from-lock

Conversation

@wjddn279
Copy link
Copy Markdown

Summary

Hi, I'm an active user and contributor of Airflow. While investigating a memory leak in Airflow, I discovered that structlog's WRITE_LOCKS holds references to file descriptors created by the process, and these references prevent the associated loggers from being garbage collected even after they are no longer in use. I was able to confirm that the memory leak is resolved by importing WRITE_LOCKS and manually removing the references.

please reference the pr related: apache/airflow#65121

As far as I can tell, there is currently no public interface for removing file references from WRITE_LOCKS. I don't think it's ideal for users to directly import and manipulate a variable from a private module. I've added a public interface that allows users to clean up WRITE_LOCKS references — I'd appreciate it if you could review it.

Pull Request Check List

  • I acknowledge this project's AI policy.
  • This pull requests is not from my main branch.
  • There's tests for all new and changed code.
  • New APIs are added to our typing tests in api.py.
  • Updated documentation for changed code.
    • New functions/classes have to be added to docs/api.rst by hand.
    • Changed/added classes/methods/functions have appropriate versionadded, versionchanged, or deprecated directives.
      • The next version is the second number in the current release + 1. The first number represents the current year. So if the current version on PyPI is 26.1.0, the next version is gonna be 26.2.0. If the next version is the first in the new year, it'll be 27.1.0.
  • Documentation in .rst and .md files is written using semantic newlines.
  • Changes (and possible deprecations) are documented in the changelog.

@ashb
Copy link
Copy Markdown
Contributor

ashb commented Apr 13, 2026

Not yet sure if we need this change on the structlog side.

@hynek
Copy link
Copy Markdown
Owner

hynek commented Apr 14, 2026

OK lmk one y'all sure. :)

@wjddn279
Copy link
Copy Markdown
Author

@hynek

Hi! I have a question regarding this :)

Is there a specific intention behind not having a separate release logic for WRITE_LOCKS? I would expect the reference to be released when the structlog object is properly closed or cleaned up, but with the current behavior, it seems like there is no way to remove the reference from WRITE_LOCKS.

@hynek
Copy link
Copy Markdown
Owner

hynek commented Apr 14, 2026

Could this be not solved with weak references? Sounds to me like they were made for such use-cases:

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed the weak reference may return the object even if there are no strong references to it.

@wjddn279
Copy link
Copy Markdown
Author

I tested replacing it with a weak reference — I tried passing log_file_descriptor as a weakref instead of passing it directly.

log_file_descriptor = log_file.open("ab")
proxy = weakref.proxy(log_file_descriptor)
underlying_logger = structlog.BytesLogger(proxy)
logger = structlog.wrap_logger(underlying_logger, logger_name="task").bind()
Traceback (most recent call last):
  File "/Users/user/Dev/airflow/airflow-core/src/a.py", line 29, in <module>
    logger, log_file_descriptor = _configure_logging(log_path)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Dev/airflow/airflow-core/src/a.py", line 14, in _configure_logging
    underlying_logger = structlog.BytesLogger(proxy)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Dev/airflow/.venv/lib/python3.12/site-packages/structlog/_output.py", line 266, in __init__
    self._lock = _get_lock_for_file(self._file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Dev/airflow/.venv/lib/python3.12/site-packages/structlog/_output.py", line 25, in _get_lock_for_file
    lock = WRITE_LOCKS.get(file)
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: unhashable type: 'weakref.ProxyType'

It ended up throwing an error when accessing WRITE_LOCKS. Is this the intended behavior you had in mind?

@wjddn279
Copy link
Copy Markdown
Author

wjddn279 commented May 1, 2026

@hynek

Hi, I wanted to check whether you see the missing WRITE_LOCK release as a bug that should be fixed. If that's the case, I'd be happy to submit a PR for it. I'm raising this because I have some concerns about how it's currently handled on the consumer side.

ashb added a commit to ashb/structlog that referenced this pull request May 1, 2026
File objects registered in WRITE_LOCKS were never released, causing a
memory leak in long-running processes that open many log files (e.g.,
task executors creating a per-task BytesLogger or WriteLogger).

WeakKeyDictionary stores keys as weak references, so entries expire
automatically when the last strong reference to a file object is
dropped — no manual cleanup needed.

Closes hynek#806 (no need to expose it, we tidy it up correctly ourselves)
@hynek
Copy link
Copy Markdown
Owner

hynek commented May 2, 2026

Hi, I wanted to check whether you see the missing WRITE_LOCK release as a bug that should be fixed. If that's the case, I'd be happy to submit a PR for it. I'm raising this because I have some concerns about how it's currently handled on the consumer side.

I hope #807 fixes your problem, because that seems to be the correct way to fix this?

@wjddn279
Copy link
Copy Markdown
Author

wjddn279 commented May 2, 2026

yes. LGTM thanks! @ashb

@hynek
Copy link
Copy Markdown
Owner

hynek commented May 2, 2026

great! when I suggested wekarefs to you, I missed that the keys were the referenced part and wanted to check back once I have time to breath, thankfully @ashb took that off my plate 🥳

@hynek hynek closed this May 2, 2026
hynek added a commit that referenced this pull request May 2, 2026
)

* Use WeakKeyDictionary for WRITE_LOCKS to prevent file object leaks

File objects registered in WRITE_LOCKS were never released, causing a
memory leak in long-running processes that open many log files (e.g.,
task executors creating a per-task BytesLogger or WriteLogger).

WeakKeyDictionary stores keys as weak references, so entries expire
automatically when the last strong reference to a file object is
dropped — no manual cleanup needed.

Closes #806 (no need to expose it, we tidy it up correctly ourselves)

* docs: rewrite changelog from user's perspective

* tests: adapt style

---------

Co-authored-by: Hynek Schlawack <hs@ox.cx>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants