-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Summary
The Python Durable Functions SDK creates a new aiohttp.ClientSession for every HTTP request to the internal RPC endpoint. This is an anti-pattern that can cause connection contention and timeouts under concurrent load.
Problem
In http_utils.py, the post_async_request function creates a new session for each request:
python async def post_async_request(url: str, data: Any = None, ...) -> List[Union[int, Any]]: async with aiohttp.ClientSession() as session: # New session per request # ...
The aiohttp documentation explicitly warns against this pattern:
"Don't create a session per request. Most likely you need a session per application which performs all requests together."
Impact
During a production investigation (ICM 695094479), we observed intermittent ConnectionTimeoutError (~30s) when calling client.start_new() to start orchestrations. The error occurs in aiohttp/connector.py:_wrap_create_connection, indicating TCP connection establishment failures.
Under concurrent load (bursts of 6-9 simultaneous requests), multiple requests compete to establish new TCP connections instead of reusing pooled connections from a shared session.
Proposed Fix
Modify http_utils.py to reuse a single ClientSession with configurable timeout and connection pooling.
Considerations
- Thread safety - May need to use an async lock when initializing the session
- Session lifecycle - Need to handle session cleanup on worker shutdown
- Connection limits - The TCPConnector limits should be tuned appropriately
Additional Context
- The 30s timeout matches aiohttp's default sock_connect timeout
- Similar issues have been reported in the aiohttp community