Skip to content

Releases: CursorTouch/Web-Use

v0.3

24 Apr 16:26

Choose a tag to compare

What's New in v0.3

New Features

  • PDF support in scrape_tool — extract content from PDF pages directly; specify individual pages with pages=[1,5,10]
  • OAuth 2.0 + PKCE authentication — built-in OAuth flow for sites that require it
  • WebMCP integration — agents can discover and call custom tools exposed by websites via the WebMCP protocol
  • Loop detectionLoopGuard detects page cycles and repeated failed retries, with prompt rules to break out automatically
  • keep_alive + disconnect() — keep the browser alive across agent runs and disconnect explicitly when done
  • within_viewport parameter on get_state — pass within_viewport=False to get all interactive elements across the entire DOM regardless of scroll position
  • Scroll position hints — browser state now includes scroll percentage and position hints for the agent

Improvements

  • Unified semantic treeDOMNode replaces separate TreeNode/TreeNodeData types; tree is now built from real DOM parent-child traversal instead of XPath reconstruction
  • Richer semantic tree output — shows id/class in CSS selector notation, and role when it differs from tag
  • Improved textual element detection — additional tags and correct inline text extraction
  • DOM capture timing — logs state_capture_ms and screenshot_capture_ms for performance visibility
  • Multiple performance optimizations across the agent loop
  • Migrated to uv package manager
  • Removed Playwright dependency — fully CDP-native via bundled src/cdp/ module

Bug Fixes

  • Fixed PDF text extraction (switched to get_text('html') + markdownify)
  • Fixed done_tool over-condensing the final output
  • Fixed bounding boxes disappearing when page is scrolled
  • Fixed viewport element filtering to correctly account for scroll offset
  • Fixed scroll position key names in DOM viewport filtering
  • Fixed sub-frame/worker crash handling in CrashWatchdog
  • Fixed 10 s _wait_for_page timeouts by tracking navigation state
  • Fixed browser stability and agent crash handling
  • Fixed Gemini tool-calling when thought signature is absent

v0.2

07 Jul 15:32

Choose a tag to compare

Feature

  • Improved the grounding to handle more corner cases.

Fix

  • Fixed the bug that causes stucking in the pages of pdf or blank pages.
  • Removed redundant parts in the agent implementation

v0.1

17 Jun 16:46

Choose a tag to compare

Key Features & Updates

  • Dual Agent Modes: Supports both non-vision and vision-based agent operation (to support both LLM and VLM).
  • Scrollable vs. Interactive Elements: A clear separation improves DOM recognition and interaction.
  • Scrolling Logic: Enables scrolling through distinct webpage sections, including nested containers.
  • HTML → Markdown: Upgraded to markdownify in the Scrape Tool for better content conversion.
  • Tab Management: Tracks the number of open tabs, active tab, and supports basic tab control.
  • Extensible Tools: Add custom tools to the agent via the additional_tools parameter.
  • Iframe & Shadow DOM Access: Enhanced ability to interact with embedded or encapsulated elements.
  • Structured Output: Returns well-defined BaseModel outputs using the structured_output parameter.
  • Human-in-the-Loop: Add manual checkpoints in the workflow via the include_human_in_loop parameter (thanks @tanmaysk001!)
  • Inference Wrapper: Fixed the bug in the open router implementation (thanks @thecoderwithHat)
  • Navigation Fixes: Improved handling of edge-case navigations across complex sites.