feat: add request timeout to load_web_page#4887
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Hi @cchinchilla-dev , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @Jacksunwei , can you please review this |
|
@rohityan, @Jacksunwei — Just following up to see if anything else is needed from my end. Happy to make any adjustments. |
|
I independently reproduced this issue and can confirm the behavior described. Tested on Windows with Python 3.11 against three failure modes — non-routable IP, invalid DNS, and a slow endpoint. All three result in unhandled exceptions with connect timeout=None confirmed in the traceback. Happy to share full reproduction logs if helpful for the review. |
…timeout-and-url-validation # Conflicts: # src/google/adk/tools/load_web_page.py # tests/unittests/tools/test_load_web_page.py
Merge #4887 ## Link to Issue or Description of Change Closes #4886 ## Update — 2026-04-30 After merging the latest `upstream/main`, the SSRF rewrite already covers URL-scheme validation and routes every fetch failure through a unified `Failed to fetch url` message. The unique contribution of this PR is now the request **timeout**; the body and tests below have been updated to reflect the post-merge scope. No code added by this PR duplicates what upstream already provides. ## Problem `load_web_page()` calls `requests.get()` without a `timeout`. If the target server is unresponsive, the agent hangs indefinitely. ## Solution Add `timeout=_DEFAULT_TIMEOUT_SECONDS` (30 seconds) to both HTTP entry points in the module: - `requests.get` in `_fetch_response` (proxy path). - `session.get` in `_fetch_direct_response` (pinned-IP path). Extend the `except` in `load_web_page` to also catch `requests.RequestException`, so timeout and connection errors return the standard `Failed to fetch url: {url}` message instead of propagating. **Design note:** the timeout is a module-level constant rather than a function parameter to keep it out of the LLM function-calling schema. It can be overridden via `load_web_page._DEFAULT_TIMEOUT_SECONDS = 30` if needed. ## Testing Plan ### Unit Tests - [x] Added/updated unit tests. - [x] All unit tests pass locally (`pytest tests/unittests/tools/test_load_web_page.py` → 10 passed). New/updated tests in `tests/unittests/tools/test_load_web_page.py`: - `test_load_web_page_uses_proxy_for_unresolved_public_hostnames` — updated to verify `timeout=10` is forwarded on the proxy path. - `test_load_web_page_passes_timeout_to_pinned_session` — verifies the timeout reaches the pinned-IP session. - `test_load_web_page_passes_timeout_to_proxied_get` — verifies the timeout is forwarded when a proxy is configured. - `test_load_web_page_returns_failure_on_timeout` — verifies `requests.exceptions.Timeout` is converted into `Failed to fetch url`. ### Manual E2E N/A — internal hardening; function signature unchanged. ## Checklist - [x] I have read the CONTRIBUTING.md document. - [x] I have performed a self-review of my own code. - [x] I have added tests that prove my fix is effective. - [x] New and existing unit tests pass locally with my changes. ## Additional Context This complements the existing SSRF protection (`allow_redirects=False`, hostname/IP validation, pinned-IP adapter) already present in the module after upstream/main was merged. Co-authored-by: Bo Yang <ybo@google.com> COPYBARA_INTEGRATE_REVIEW=#4887 from cchinchilla-dev:feat/load-web-page-timeout-and-url-validation 4bd4799 PiperOrigin-RevId: 930335977
|
Thank you @cchinchilla-dev for your contribution! 🎉 Your changes have been successfully imported and merged via Copybara in commit 792775f. Closing this PR as the changes are now in the main branch. |
Link to Issue or Description of Change
Closes #4886
Update — 2026-04-30
After merging the latest
upstream/main, the SSRF rewrite already covers URL-scheme validation and routes every fetch failure through a unifiedFailed to fetch urlmessage. The unique contribution of this PR is now the request timeout; the body and tests below have been updated to reflect the post-merge scope. No code added by this PR duplicates what upstream already provides.Problem
load_web_page()callsrequests.get()without atimeout. If the target server is unresponsive, the agent hangs indefinitely.Solution
Add
timeout=_DEFAULT_TIMEOUT_SECONDS(10 seconds) to both HTTP entry points in the module:requests.getin_fetch_response(proxy path).session.getin_fetch_direct_response(pinned-IP path).Extend the
exceptinload_web_pageto also catchrequests.RequestException, so timeout and connection errors return the standardFailed to fetch url: {url}message instead of propagating.Design note: the timeout is a module-level constant rather than a function parameter to keep it out of the LLM function-calling schema. It can be overridden via
load_web_page._DEFAULT_TIMEOUT_SECONDS = 30if needed.Testing Plan
Unit Tests
pytest tests/unittests/tools/test_load_web_page.py→ 10 passed).New/updated tests in
tests/unittests/tools/test_load_web_page.py:test_load_web_page_uses_proxy_for_unresolved_public_hostnames— updated to verifytimeout=10is forwarded on the proxy path.test_load_web_page_passes_timeout_to_pinned_session— verifies the timeout reaches the pinned-IP session.test_load_web_page_passes_timeout_to_proxied_get— verifies the timeout is forwarded when a proxy is configured.test_load_web_page_returns_failure_on_timeout— verifiesrequests.exceptions.Timeoutis converted intoFailed to fetch url.Manual E2E
N/A — internal hardening; function signature unchanged.
Checklist
Additional Context
This complements the existing SSRF protection (
allow_redirects=False, hostname/IP validation, pinned-IP adapter) already present in the module after upstream/main was merged.