Releases: CursorTouch/Web-Use
Releases · CursorTouch/Web-Use
v0.3
What's New in v0.3
New Features
- PDF support in
scrape_tool— extract content from PDF pages directly; specify individual pages withpages=[1,5,10] - OAuth 2.0 + PKCE authentication — built-in OAuth flow for sites that require it
- WebMCP integration — agents can discover and call custom tools exposed by websites via the WebMCP protocol
- Loop detection —
LoopGuarddetects page cycles and repeated failed retries, with prompt rules to break out automatically keep_alive+disconnect()— keep the browser alive across agent runs and disconnect explicitly when donewithin_viewportparameter onget_state— passwithin_viewport=Falseto get all interactive elements across the entire DOM regardless of scroll position- Scroll position hints — browser state now includes scroll percentage and position hints for the agent
Improvements
- Unified semantic tree —
DOMNodereplaces separateTreeNode/TreeNodeDatatypes; tree is now built from real DOM parent-child traversal instead of XPath reconstruction - Richer semantic tree output — shows
id/classin CSS selector notation, and role when it differs from tag - Improved textual element detection — additional tags and correct inline text extraction
- DOM capture timing — logs
state_capture_msandscreenshot_capture_msfor performance visibility - Multiple performance optimizations across the agent loop
- Migrated to
uvpackage manager - Removed Playwright dependency — fully CDP-native via bundled
src/cdp/module
Bug Fixes
- Fixed PDF text extraction (switched to
get_text('html')+markdownify) - Fixed
done_toolover-condensing the final output - Fixed bounding boxes disappearing when page is scrolled
- Fixed viewport element filtering to correctly account for scroll offset
- Fixed scroll position key names in DOM viewport filtering
- Fixed sub-frame/worker crash handling in
CrashWatchdog - Fixed 10 s
_wait_for_pagetimeouts by tracking navigation state - Fixed browser stability and agent crash handling
- Fixed Gemini tool-calling when thought signature is absent
v0.2
v0.1
Key Features & Updates
- Dual Agent Modes: Supports both non-vision and vision-based agent operation (to support both LLM and VLM).
- Scrollable vs. Interactive Elements: A clear separation improves DOM recognition and interaction.
- Scrolling Logic: Enables scrolling through distinct webpage sections, including nested containers.
- HTML → Markdown: Upgraded to
markdownifyin theScrape Toolfor better content conversion. - Tab Management: Tracks the number of open tabs, active tab, and supports basic tab control.
- Extensible Tools: Add custom tools to the agent via the
additional_toolsparameter. - Iframe & Shadow DOM Access: Enhanced ability to interact with embedded or encapsulated elements.
- Structured Output: Returns well-defined BaseModel outputs using the
structured_outputparameter. - Human-in-the-Loop: Add manual checkpoints in the workflow via the
include_human_in_loopparameter (thanks @tanmaysk001!) - Inference Wrapper: Fixed the bug in the
open routerimplementation (thanks @thecoderwithHat) - Navigation Fixes: Improved handling of edge-case navigations across complex sites.