Re-evaluate NO_PROXY per-request so it is honored across redirects#3734
Open
fl0Lec wants to merge 1 commit into
Open
Re-evaluate NO_PROXY per-request so it is honored across redirects#3734fl0Lec wants to merge 1 commit into
fl0Lec wants to merge 1 commit into
Conversation
e56a552 to
2525642
Compare
Until now ``URLLib3Session`` made its proxy/no-proxy decision at the moment of session construction: ``EndpointCreator._get_proxies`` filtered ``getproxies()`` through ``should_bypass_proxies`` against the *initial* endpoint URL, so if ``NO_PROXY`` matched that URL the session was given an empty proxy dict and never consulted a proxy again for any subsequent request through that endpoint. That breaks the common S3 cross-region redirect case described in boto3#4380, boto3#4360, botocore#2707 and the long-standing boto#926: 1. Pod in ap-northeast-1 creates an S3 client with no explicit region. 2. The first request hits s3.ap-northeast-1.amazonaws.com. With a typical NO_PROXY=*.s3.ap-northeast-1.amazonaws.com that request correctly goes direct. 3. S3 returns 301 PermanentRedirect to <bucket>.s3.us-east-1.amazonaws.com. 4. The S3 redirect handler rewrites request_dict['url'] and re-issues through the same Endpoint / URLLib3Session. The us-east-1 URL does NOT match NO_PROXY and should go through the corporate proxy -- but the session's proxy dict was frozen empty at step 2, so the redirect also bypasses the proxy and gets blocked by corporate egress policy. This change defers the NO_PROXY decision to send time: * ``get_environ_proxies`` grows a ``no_proxy_filter`` parameter. When False, it returns the raw environment proxies without applying ``should_bypass_proxies``. * ``EndpointCreator._get_proxies`` passes ``no_proxy_filter=False`` so the session always sees the configured proxies. * ``ProxyConfiguration.proxy_url_for`` calls ``should_bypass_proxies`` on the URL it is given, returning ``None`` when ``NO_PROXY`` matches. This re-evaluates per request, so the decision follows the actual URL being sent -- including redirect targets. The bypass check applies uniformly to env-sourced and ``Config(proxies=)`` proxies, which is consistent with how ``requests`` already behaves and addresses the long-standing complaint that ``NO_PROXY`` is not honored for explicitly configured proxies (botocore#2707).
2525642 to
db10340
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
URLLib3Session's proxy/no-proxy decision is currently frozen at session-construction time.EndpointCreator._get_proxiescallsget_environ_proxies(endpoint_url), which in turn filtersgetproxies()throughshould_bypass_proxiesagainst the initial endpoint URL. IfNO_PROXYmatches that URL, the session is given an empty proxy dict and never consults a proxy again for any subsequent request through thatEndpoint— even if the request URL changes.The symptom shows up most often on S3 cross-region redirects:
ap-northeast-1constructsboto3.client('s3')without an explicit region.s3.ap-northeast-1.amazonaws.com. With a typicalNO_PROXY=*.s3.ap-northeast-1.amazonaws.com, that request correctly goes direct.301 PermanentRedirectto<bucket>.s3.us-east-1.amazonaws.com.S3RegionRedirectorv2.redirect_from_error(botocore/utils.py:1757) rewritesrequest_dict['url']and re-issues through the sameEndpoint/URLLib3Session. The us-east-1 URL does not matchNO_PROXYand should go through the corporate proxy — but the session's proxy dict was frozen empty at step 2, so the redirect also bypasses the proxy and gets blocked by corporate egress policy.The same pattern affects any 301/302/307/308 redirect that crosses host boundaries, and the inverse case (initial URL routes through the proxy, redirect target matches
NO_PROXYand should go direct) suffers from a related staleness — the cached proxy decision is wrong for the new URL.Related issues
Cross-cutting
NO_PROXY/ proxy bugs that point at the same root cause or the same surface:s3client globalendpoint_urlresults inPermanentRedirectproxyoption inbotocore.config.Configdoes not mirror environment variable behavior. boto3#2420 — Config-proxy vs env-var mismatch (closed stale)Proxy Configis not respected (confirmed bug, open)region_name(open since 2014; this PR fixes the symptom even when the underlying request hits the wrong region first)NO_PROXYnot considered in lambda local invocation (closed stale)Fix
Defer the
NO_PROXYdecision from session construction to send time:botocore.utils.get_environ_proxiesgrows ano_proxy_filterparameter (defaultTrue, preserving the previous signature). WhenFalse, it returns the raw environment proxies without applyingshould_bypass_proxiesupfront.botocore.endpoint.EndpointCreator._get_proxiesnow callsget_environ_proxies(url, no_proxy_filter=False)so theURLLib3Sessionalways sees the configured proxies, regardless of whether the initial URL would have matchedNO_PROXY.botocore.httpsession.ProxyConfiguration.proxy_url_forcallsshould_bypass_proxieson the URL it is given, returningNonewhenNO_PROXYmatches. This re-evaluates per request, so the decision follows the actual URL being sent — including redirect targets.The bypass check now applies uniformly to env-sourced and
Config(proxies=...)proxies. This is consistent with howrequestsalready behaves and also addresses the long-standing complaint in #2707 thatNO_PROXYis not honored for explicitly configured proxies.Diff summary
botocore/endpoint.py_get_proxiespassesno_proxy_filter=Falsebotocore/httpsession.pyProxyConfiguration.proxy_url_forcallsshould_bypass_proxiesper request (lazy import to avoid a circular import withbotocore.utils)botocore/utils.pyget_environ_proxiesgrowsno_proxy_filter=Truekwargtests/unit/test_http_session.pytests/functional/test_proxy_redirect.py.changes/next-release/Tests
Existing suite
All existing unit tests pass (4588 passed, 95 skipped, 0 regressions). The proxy/endpoint-touching subset of functional tests (
-k "proxy or redirect or endpoint") passes cleanly: 18,147 passed, 20 skipped, 0 failed.New regression tests
Three new tests in
tests/unit/test_http_session.py:TestProxyConfiguration.test_proxy_url_for_honors_no_proxy_per_call—ProxyConfiguration.proxy_url_forreturnsNonefor a URL matchingNO_PROXYand the proxy URL for one that does not, even though both share the sameProxyConfigurationinstance.TestURLLib3Session.test_no_proxy_env_var_bypasses_proxy_per_request— first request to a URL that matchesNO_PROXYgoes through the non-proxy pool manager; a subsequent request through the same session to a URL that does not matchNO_PROXYgoes through the proxy manager (this is the cross-region redirect case).TestURLLib3Session.test_no_proxy_env_var_matches_redirect_target— inverse direction: initial request goes through the proxy, a subsequent request to a URL matchingNO_PROXYgoes direct.One new end-to-end test in
tests/functional/test_proxy_redirect.py:TestNoProxyAcrossRedirect.test_no_proxy_re_evaluated_across_sequential_requests— spins up two real local HTTP servers (one acting as the backend, one acting as the proxy) on dynamically-allocated ports, then makes two sequentialURLLib3Session.send()calls through the same session to URLs whose hosts differ only in name (127.0.0.1vslocalhost). WithNO_PROXY=127.0.0.1set, the first request must hit the backend directly and the second must hit the proxy. The test asserts the correct server received each request, demonstrating the fix end-to-end through real socket I/O. Hermetic (loopback only), deterministic, and completes in ≈0.15s.Each new test was verified to fail without the fix and pass with it by reverting the production diff and re-running.
Backward compatibility
get_environ_proxies) gains a kwarg with a default that preserves the prior behavior for any external caller.NO_PROXYnow reliably follows the actual request URL. The only observable behavior change is that the redirect target is now correctly evaluated againstNO_PROXY, which is the fix.Config(proxies=...)explicitly:NO_PROXYenv now also gates that proxy, matchingrequestssemantics and resolving Default Proxy Config is not respected #2707. Users who setConfig(proxies=...)but want to ignoreNO_PROXYwould need to unsetNO_PROXY— but no observable historical caller relies onNO_PROXYbeing silently ignored.botocore.utils.IMDSFetcher, line 425) was intentionally left untouched — it still usesno_proxy_filter=True(the default). Fixing Proxy configuration is not respected when fetching credentials from IMDS endpoint #2644 would be a follow-up.Acknowledgment of prior unmerged proxy PRs
I understand the maintainers have historically declined to broaden proxy support — most notably #2541 (SOCKS) was closed as "not under consideration." This PR is intentionally different:
get_environ_proxieskwarg default preserves prior behavior for external callers.Happy to iterate on the approach — alternatives considered include applying
should_bypass_proxiesonly inURLLib3Session.send()(rejected: leaves_get_proxiesreturning{}for env-bypass cases, so explicit-proxy semantics diverge), or adding a new opt-in flag (rejected: defeats the purpose of fixing the bug for the common case).