Releases · CursorTouch/Web-Use · GitHub

24 Apr 16:26

Jeomon

v0.3 Latest

Latest

What's New in v0.3

New Features

PDF support in scrape_tool — extract content from PDF pages directly; specify individual pages with pages=[1,5,10]
OAuth 2.0 + PKCE authentication — built-in OAuth flow for sites that require it
WebMCP integration — agents can discover and call custom tools exposed by websites via the WebMCP protocol
Loop detection — LoopGuard detects page cycles and repeated failed retries, with prompt rules to break out automatically
keep_alive + disconnect() — keep the browser alive across agent runs and disconnect explicitly when done
within_viewport parameter on get_state — pass within_viewport=False to get all interactive elements across the entire DOM regardless of scroll position
Scroll position hints — browser state now includes scroll percentage and position hints for the agent

Improvements

Unified semantic tree — DOMNode replaces separate TreeNode/TreeNodeData types; tree is now built from real DOM parent-child traversal instead of XPath reconstruction
Richer semantic tree output — shows id/class in CSS selector notation, and role when it differs from tag
Improved textual element detection — additional tags and correct inline text extraction
DOM capture timing — logs state_capture_ms and screenshot_capture_ms for performance visibility
Multiple performance optimizations across the agent loop
Migrated to uv package manager
Removed Playwright dependency — fully CDP-native via bundled src/cdp/ module

Bug Fixes

Fixed PDF text extraction (switched to get_text('html') + markdownify)
Fixed done_tool over-condensing the final output
Fixed bounding boxes disappearing when page is scrolled
Fixed viewport element filtering to correctly account for scroll offset
Fixed scroll position key names in DOM viewport filtering
Fixed sub-frame/worker crash handling in CrashWatchdog
Fixed 10 s _wait_for_page timeouts by tracking navigation state
Fixed browser stability and agent crash handling
Fixed Gemini tool-calling when thought signature is absent

Assets 2

07 Jul 15:32

Jeomon

v0.2

Feature

Improved the grounding to handle more corner cases.

Fix

Fixed the bug that causes stucking in the pages of pdf or blank pages.
Removed redundant parts in the agent implementation

Assets 2

17 Jun 16:46

Jeomon

v0.1

Key Features & Updates

Dual Agent Modes: Supports both non-vision and vision-based agent operation (to support both LLM and VLM).
Scrollable vs. Interactive Elements: A clear separation improves DOM recognition and interaction.
Scrolling Logic: Enables scrolling through distinct webpage sections, including nested containers.
HTML → Markdown: Upgraded to markdownify in the Scrape Tool for better content conversion.
Tab Management: Tracks the number of open tabs, active tab, and supports basic tab control.
Extensible Tools: Add custom tools to the agent via the additional_tools parameter.
Iframe & Shadow DOM Access: Enhanced ability to interact with embedded or encapsulated elements.
Structured Output: Returns well-defined BaseModel outputs using the structured_output parameter.
Human-in-the-Loop: Add manual checkpoints in the workflow via the include_human_in_loop parameter (thanks @tanmaysk001!)
Inference Wrapper: Fixed the bug in the open router implementation (thanks @thecoderwithHat)
Navigation Fixes: Improved handling of edge-case navigations across complex sites.

Contributors

tanmaysk001 and thecoderwithHat

Assets 2