Skip to content

netizensnoopy/weblens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weblens

Python 3.10+ Markdown output Optional Chromium rendering CLI Python MCP

Weblens reads the web and turns pages into readable Markdown.

Built for the terminal, scripts, automation, MCP tools, and AI workflows.

DocumentationCLIPythonRender ModeBenchmarks

Why Weblens

Weblens is a read tool for the web.

Use it when you want:

  • the useful text from a page
  • Markdown instead of DOM noise
  • a fast HTTP path first
  • optional Chromium rendering when a page really needs it
  • one tool that works in the CLI, Python, and MCP

What It Does

  • Fetches a URL with browser-like HTTP defaults
  • Extracts readable content as Markdown
  • Supports cookies, headers, proxies, and timeouts
  • Supports JavaScript rendering with a local Chromium-based browser
  • Supports bounded render scrolling for lazy-loaded pages
  • Supports ordered concurrent batch fetching with fetch_many()
  • Supports validator-based caching with PageCache
  • Supports benchmarking for extraction and throughput

Install

From this repo:

python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .

Optional extras:

pip install -e ".[speed]"
pip install -e ".[render]"
pip install -e ".[bench-compare]"
  • speed: enables selectolax for a faster primary extractor
  • render: enables JavaScript rendering with nodriver
  • bench-compare: enables extra benchmark backends like BeautifulSoup

Quick Start

Basic CLI use:

weblens https://www.wikipedia.org/

Basic Python use:

import weblens

markdown = weblens.fetch("https://www.wikipedia.org/")
print(markdown)

Render a JS-heavy page:

weblens https://www.spotify.com/ \
  --render-js \
  --browser-executable "$WEBLENS_CHROME"

Load more of a lazy page:

weblens https://example.com/feed \
  --render-js \
  --render-scroll 3 \
  --browser-executable "$WEBLENS_CHROME"

Scroll to the bottom of a long finite page:

weblens https://example.com/long-page \
  --render-js \
  --render-scroll-to-bottom \
  --browser-executable "$WEBLENS_CHROME"

Python render example:

import weblens

markdown = weblens.fetch(
    "https://www.spotify.com/",
    render_js=True,
    browser_executable="/path/to/chrome",
    render_wait=5.0,
    render_scroll_to_bottom=True,
)

Render Mode

Use --render-js when:

  • the page is a JavaScript app
  • the fast path returns JavaScriptRequiredError
  • the page works in a real browser session but not in raw HTML

Weblens does not bundle Chromium. Point it at a local Chromium-based browser:

export WEBLENS_CHROME=/path/to/chrome
weblens https://open.spotify.com/ --render-js --browser-executable "$WEBLENS_CHROME"

Documentation

Where It Shines

Weblens is especially good for:

  • saving articles, docs, and product pages as readable Markdown
  • feeding clean web content into LLMs
  • building terminal and Python automation around web reading
  • using one MCP tool to read the internet
  • reading JS-heavy pages only when needed
  • batch text collection across many URLs

Benchmarking

Local extractor benchmarks:

weblens-bench
weblens-bench --iterations 20 --warmup 5

Live benchmarks:

weblens-bench --live-only \
  --url https://www.wikipedia.org/ \
  --url https://docs.python.org/3/

Comparison benchmarks:

weblens-bench --compare-only
weblens-bench --live-only --compare --url https://www.wikipedia.org/

About

Weblens is a fast tool for the web that fetches pages and turns them into readable Markdown for the CLI, Python, MCP, automation, and AI workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages