Not just a browser. A workstation for your AI agent.
A full-featured headless browser for OpenClaw agents. Built on Playwright and Bun. Instead of taking screenshots and sending them to expensive vision models, Gbrow reads pages through the browser's accessibility tree — fast, free, and way more reliable.
Most browser tools for AI agents take screenshots, upload them to GPT-4o or Claude, and wait for a description. That works, but it's slow (3-10 seconds per page), costs money (~$0.01 per read), and breaks when API keys expire.
Gbrow uses Playwright's ariaSnapshot() — the same structured data that screen readers use. Instead of a picture of the page, you get a clean text tree:
@e1 [heading] "Welcome to Example" [level=1]
@e2 [link] "Get Started"
@e3 [button] "Sign in"
@e4 [textbox] "Search"
Each element gets a ref (@e1, @e2, etc.) that you can click, fill, or inspect directly. No vision model, no API calls, no cost.
Via ClawHub (recommended):
clawhub install gbrowVia Git:
cd ~/.openclaw/workspace/skills
git clone https://github.com/ashish797/Gbrow.git
cd Gbrow && bash setup.shEither way, the setup installs Bun (if needed), pulls dependencies, and installs Chromium. About 30 seconds.
Start the server:
bun run src/server.tsThen send commands over HTTP:
PORT=$(python3 -c "import json; print(json.load(open('.gstack/browse.json'))['port'])")
TOKEN=$(python3 -c "import json; print(json.load(open('.gstack/browse.json'))['token'])")
# Navigate
curl -s -X POST "http://127.0.0.1:${PORT}/command" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{"command":"goto","args":["https://news.ycombinator.com"]}'
# Read the page
curl -s -X POST "http://127.0.0.1:${PORT}/command" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{"command":"snapshot","args":["-i"]}'
# Click an element
curl -s -X POST "http://127.0.0.1:${PORT}/command" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{"command":"click","args":["@e3"]}'| Command | Description |
|---|---|
goto <url> |
Navigate to URL |
back / forward / reload |
History navigation |
url |
Current URL |
| Command | Description |
|---|---|
snapshot [-i|-c|-d N] |
Accessibility tree with element refs |
text |
Cleaned page text |
html [selector] |
Raw HTML |
links |
All links as "text -> href" |
forms |
Form fields as JSON |
| Command | Description |
|---|---|
click <ref> |
Click element by ref (e.g. @e3) |
fill <ref> <text> |
Fill an input field |
type <ref> <text> |
Type with keyboard events |
select <ref> <value> |
Select dropdown value |
press <key> |
Press a key (Enter, Tab, etc.) |
scroll <direction> |
Scroll the page |
| Command | Description |
|---|---|
js <expr> |
Run JavaScript on the page |
css <sel> <prop> |
Get computed CSS value |
attrs <ref> |
Element attributes as JSON |
is <prop> <ref> |
Check state (visible, enabled, etc.) |
| Command | Description |
|---|---|
tabs |
List open tabs |
tab N |
Switch to tab N |
newtab |
Open new tab |
closetab |
Close current tab |
| Command | Description |
|---|---|
screenshot |
Take screenshot |
pdf |
Save page as PDF |
responsive <w> <h> |
Set viewport size |
| Flag | What it does |
|---|---|
-i |
Interactive elements only (buttons, links, inputs) |
-c |
Compact — remove empty structural nodes |
-d N |
Limit tree depth to N levels |
-s <sel> |
Scope to a CSS selector |
-D |
Diff against previous snapshot |
-a |
Annotated screenshot with ref overlays |
Your Agent ---HTTP---> Gbrow Server ---> Chromium (headless)
|
v
Accessibility Tree
(structured text + refs)
- Agent sends a command (goto, snapshot, click, etc.)
- Gbrow server receives it, runs it through Playwright
- For reading, it uses
ariaSnapshot()— not screenshots - Result is structured text with clickable refs
- Agent can click refs, fill forms, navigate — all without vision models
You can. But Gbrow gives you:
- Persistent server — browser stays alive between commands
- Auth token — only authorized callers can use it
- Tab management — open, switch, close tabs
- Ref system — structured interaction without CSS selectors
- Auto-shutdown — kills itself after 30 minutes of inactivity
- Docker-friendly — handles sandboxing issues automatically
| Feature | Gbrow | Vision-based tools | Raw Playwright |
|---|---|---|---|
| Page reading | Accessibility tree | Screenshot + GPT-4o | Manual extraction |
| Cost per page | Free | ~$0.01 | Free |
| Speed | < 100ms | 3-10s | Varies |
| API key needed | No | Yes | No |
| Click method | @ref |
CSS selector | CSS selector |
| Tab management | Built-in | No | Manual |
| Persistent server | Yes | No | No |
| OpenClaw integration | Yes | Varies | No |
Gbrow works in Docker out of the box. The setup.sh script handles Chromium sandboxing automatically.
If you're running manually in Docker, set chromiumSandbox: false in the browser launch options.
Built on gstack by Gary Tan. Adapted for OpenClaw under the MIT license.
MIT