Skip to content

test(browser): regression tests for issue #723 error clarity#1686

Open
sena-labs wants to merge 1 commit into
agent0ai:mainfrom
sena-labs:fix/browser-runtime-error-clarity-issue-723
Open

test(browser): regression tests for issue #723 error clarity#1686
sena-labs wants to merge 1 commit into
agent0ai:mainfrom
sena-labs:fix/browser-runtime-error-clarity-issue-723

Conversation

@sena-labs
Copy link
Copy Markdown
Contributor

Summary

  • Add three regression tests to tests/test_browser_agent_regressions.py that guard against reintroducing the confusing "Task reached step limit without completion. Last page: about:blank" error reported in Problem with the browsing tool #723.
  • Each test patches get_runtime to simulate a different failure mode of the Playwright-based browser tool and asserts that the returned message is specific, actionable, and does not contain the old "step limit" / "about:blank" wording.

Background

Issue #723 reported that every browser task since v0.9.4 silently failed with:

Task reached step limit without completion. Last page: about:blank.
The browser agent may need clearer instructions on when to finish.

Root cause: the removed browser-use library sent an anthropic-beta: structured-outputs-2025-11-13 header to OpenRouter, which proxied requests to providers (Amazon Bedrock, Google Vertex) that reject it. After three consecutive API errors, browser-use exhausted its step budget and returned the generic fallback message above — giving users no indication that the real problem was an API rejection.

The fix (commit 983d431a) replaced the browser-use agent with the native Playwright _browser plugin, which propagates concrete error messages:

Failure mode Current message
Runtime cannot start Browser runtime unavailable: <reason>
Action call fails Browser <action> failed: <reason>
Unknown action keyword Unknown browser action: <name>

The existing test_legacy_browser_dependency_is_removed test already verifies that browser-use and _browser_agent are gone. This PR adds complementary tests that verify the new tool's error-handling contracts, so the old vague wording cannot be reintroduced undetected.

Tests added

Test name What it verifies
test_browser_tool_returns_clear_error_when_runtime_unavailable get_runtime raises → message contains "runtime unavailable", not "step limit"
test_browser_action_failure_names_the_failed_action runtime.call raises → message names the failing action and contains "failed"
test_browser_unknown_action_returns_informative_message Unknown action keyword → message names the keyword, not a blank failure

Test plan

pytest tests/test_browser_agent_regressions.py \
  -k "test_browser_tool_returns_clear_error_when_runtime_unavailable \
      or test_browser_action_failure_names_the_failed_action \
      or test_browser_unknown_action_returns_informative_message \
      or test_legacy_browser_dependency_is_removed" \
  -v

All four tests pass. No other tests are modified.

Closes #723

🤖 Generated with Claude Code

Guard against reintroducing the vague 'Task reached step limit without
completion. Last page: about:blank' message that the removed browser-use
agent produced when the underlying LLM provider rejected structured-output
headers (e.g. 'anthropic-beta: structured-outputs-2025-11-13').

The current Playwright-based browser tool returns specific, actionable
messages for each failure mode:
- Runtime unavailable  → 'Browser runtime unavailable: <reason>'
- Action call failure  → 'Browser <action> failed: <reason>'
- Unknown action       → 'Unknown browser action: <name>'

Three new tests verify these contracts and assert the old confusing
wording is absent.

Closes agent0ai#723

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with the browsing tool

1 participant