Skip to content

fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615

Closed
armorbreak001 wants to merge 3 commits into
tornadoweb:masterfrom
armorbreak001:fix/tls-timeout-socket-leak
Closed

fix(tcpclient): close SSLIOStream on TLS handshake timeout to prevent socket leak#3615
armorbreak001 wants to merge 3 commits into
tornadoweb:masterfrom
armorbreak001:fix/tls-timeout-socket-leak

Conversation

@armorbreak001

Copy link
Copy Markdown

Summary

When TCPClient.connect() is called with both ssl_options and timeout, a TLS handshake timeout causes the underlying socket to leak. The socket becomes unreachable from the caller and is never closed.

Root cause: IOStream.start_tls() transfers socket ownership to a new SSLIOStream before the TLS handshake completes (the raw socket is extracted, the original stream sets self.socket = None, and an SSLIOStream wrapping the socket is created as a local variable inside start_tls()). When gen.with_timeout fires, the TimeoutError is raised but the wrapped future is not cancelled (by design of gen.with_timeout), leaving the SSLIOStream registered on the IOLoop with no external reference — permanently leaking the file descriptor.

Fix

  • In tcpclient.py: Save the future returned by start_tls() and call tls_future.cancel() on TimeoutError before re-raising.
  • In iostream.py: Register a done callback on the TLS connect future inside start_tls() that closes the SSLIOStream when the future is cancelled, ensuring the socket fd is released and the handler is removed from the IOLoop.

Fixes #3614

… socket leak

When TCPClient.connect() times out during the TLS handshake,
the socket is transferred to an SSLIOStream inside start_tls()
but never cleaned up because gen.with_timeout does not cancel
the wrapped future. This leaves the socket fd permanently leaked.

Save the start_tls future and cancel it on TimeoutError, which
triggers cleanup of the underlying SSLIOStream via a new cancel
callback registered in IOStream.start_tls().

Fixes tornadoweb#3614

@bdarnell bdarnell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another PR for a socket leak in tcpclient in #3579. It's possible that these have the same root cause.

Comment thread tornado/iostream.py Outdated
ssl_stream.read_chunk_size = self.read_chunk_size

def _on_cancel(fut: Future[SSLIOStream]) -> None:
if fut.cancelled():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test to make sure we are in fact closing the stream here. I think that when tornado.gen.with_timeout reaches its timeout, the Future is terminated with a TimeoutError, but not cancelled, so this condition is never true (the call to tls_future.cancel() in tcpclient is a no-op because the future has already been terminated).

What we probably want to do here is close the stream in a done_callback if fut.exception() is None.

…out test

The _on_cancel callback in start_tls() calls ssl_stream.close() when
the TLS future is cancelled (timeout). However, _signal_closed() calls
_ssl_connect_future.exception() which raises CancelledError on cancelled
futures, preventing proper cleanup.

Wrap the exception() call in try/except CancelledError to ensure
close() completes successfully even when the SSL connect future was
cancelled.

Add test_tls_handshake_timeout_closes_stream to verify that:
- TimeoutError is raised on TLS handshake timeout
- The SSLIOStream is properly closed (no fd leak)
- Addresses bdarnell's review feedback about needing a test
@armorbreak001

Copy link
Copy Markdown
Author

Thanks again for the review, @bdarnell. I've added a test and also found/fixed a related bug:

New test: test_tls_handshake_timeout_closes_stream — verifies that when the TLS handshake times out, the SSLIOStream is properly closed with no fd leak.

Additional fix: While writing the test, I discovered that _signal_closed() was calling _ssl_connect_future.exception() on a cancelled future, which raises CancelledError. This prevented close() from completing successfully. Added a try/except CancelledError around that call (same pattern already used for the regular future in the same method).

The test passes along with all existing tcpclient tests (27 passed).

…mport cleanup

- Cancel TLS future on timeout to prevent stale coroutine (bdarnell)
- Add @skipIfNonUnix for /proc/self/fd test (platform-specific)
- Move ssl import to module level (style)
@armorbreak001

Copy link
Copy Markdown
Author

Hi @bdarnell, thanks for the review on #3636. I've updated this PR (#3615) with the following changes based on your feedback:

  1. Cancel tls_future on timeout — The TLS handshake coroutine is now cancelled via tls_future.cancel() in addition to closing _tls_stream, so the coroutine won't wake up to find stale state.
  2. Platform guard — Added @skipIfNonUnix decorator to the fd-leak test since /proc/self/fd is Linux-specific.
  3. Import cleanup — Moved import ssl to module level per style convention.

Closed #3636 as superseded.

# under the License.
import getpass
import os
import ssl as _ssl

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just import ssl?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — changed import ssl as _ssl to import ssl to match the codebase convention (all other files use plain import ssl).

Comment thread tornado/tcpclient.py
except gen.TimeoutError:
# Cancel the TLS handshake coroutine so it doesn't
# continue running with a stale _tls_stream reference.
tls_future.cancel()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both tls_future.cancel and tls_stream.close? I think we'd only need tls_future.cancel if start_tls had a try/finally to do its own cleanup. Or, as I said in #3636 (comment), I think it might be cleaner to move timeout awareness into start_tls instead of trying to spread it across layers like this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — tls_future.cancel() is a no-op here since gen.with_timeout resolves the future with TimeoutError (doesn't cancel it). Removed it. The stream.close() is what actually prevents the fd leak.

I agree that moving timeout awareness into start_tls would be cleaner architecturally — that feels like a follow-up/refactor since it's a bigger change. Want me to do that in this PR or keep it scoped to the fix?

@bdarnell

Copy link
Copy Markdown
Member

I'm not going to babysit someone else's bot.

@bdarnell bdarnell closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Socket leak in TCPClient.connect when TLS handshake times out

2 participants