webpipe

Local "search → fetch → extract" plumbing, exposed as an MCP stdio server.

cache-first fetch by default
explicit limits on bytes/chars/results
stable, schema-versioned JSON outputs
zero API keys required — the default tool surface works out of the box

Cursor MCP setup

Add this to your ~/.cursor/mcp.json:

{
  "mcpServers": {
    "webpipe": {
      "command": "cargo",
      "args": ["run", "--release", "--bin", "webpipe", "--features", "stdio", "--", "mcp-stdio"],
      "cwd": "/path/to/webpipe"
    }
  }
}

Default tool surface (zero keys required)

Tool	Purpose
`webpipe_meta`	Server config, capabilities, and runtime stats
`search_evidence`	Main tool: search → fetch → extract → ranked top chunks
`web_extract`	Extract readable text from a URL you already have
`arxiv`	arXiv papers: search by topic (`query`) or get metadata for one paper (`id_or_url`)
`paper_search`	Search papers via Semantic Scholar / OpenAlex

Tools that require API keys (shown only when configured):

Tool	Key required
`web_perplexity`	`WEBPIPE_PERPLEXITY_API_KEY`

web_fetch, arxiv_search, and arxiv_enrich are still callable as aliases but no longer appear in the default surface.

Tool cookbook

`search_evidence` (main evidence tool)

URLs-mode (no search key needed):

{
  "urls": ["https://example.com/docs/install"],
  "query": "installation steps",
  "max_urls": 2,
  "top_chunks": 4
}

Search-mode (requires Brave, Tavily, or SearXNG key):

{
  "query": "tool-using LLM agents",
  "provider": "auto",
  "max_results": 5,
  "max_urls": 3,
  "top_chunks": 5
}

Compact for tight agent loops:

{ "urls": ["https://example.com"], "query": "...", "max_urls": 1, "top_chunks": 3, "minimal_output": true }

`web_extract` (single URL)

{ "url": "https://example.com/docs", "include_text": true, "max_chars": 20000 }

`arxiv` (search or enrich)

Search by topic:

{ "query": "tool-using agents", "categories": ["cs.AI", "cs.CL"], "per_page": 10 }

Get metadata for a specific paper:

{ "id_or_url": "2401.12345", "include_discussions": true }

`web_perplexity` (when key is configured)

{ "query": "trade-offs between RAG and long context", "model": "sonar" }

Install

# From crates.io:
cargo install webpipe

# From repo root:
cargo install --path crates/webpipe-mcp --bin webpipe --features stdio --force

# Run as MCP stdio server:
webpipe mcp-stdio

Quick test

cargo test -p webpipe --features stdio -- --skip live_

If another cargo process holds the build-directory lock:

CARGO_TARGET_DIR=/tmp/webpipe-test-target cargo test -p webpipe --features stdio -- --skip live_

Privacy modes

Set WEBPIPE_PRIVACY_MODE to:

normal (default): network allowed
offline: no network egress (use cache only)
anonymous: network only via explicit proxy (WEBPIPE_ANON_PROXY)

Optional env vars

Var	Purpose
`WEBPIPE_BRAVE_API_KEY`	Brave search
`WEBPIPE_TAVILY_API_KEY`	Tavily search
`WEBPIPE_SEARXNG_ENDPOINT`	Self-hosted SearXNG
`WEBPIPE_FIRECRAWL_API_KEY`	Firecrawl remote fetch
`WEBPIPE_PERPLEXITY_API_KEY`	Perplexity synthesis
`WEBPIPE_ANON_PROXY`	Proxy for anonymous mode (e.g. `socks5h://127.0.0.1:9050`)

CLI (no Cursor needed)

# List tools as an MCP client sees them:
webpipe mcp-list-tools

# Call a tool and print Markdown:
webpipe mcp-call --tool search_evidence --args-json '{"query":"example","urls":["https://example.com"],"max_urls":1}'

Workspace layout

crates/webpipe-core: backend-agnostic types + traits
crates/webpipe-local: local implementations (reqwest + cache + extractors)
crates/webpipe-mcp: CLI + MCP stdio server (bin webpipe)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
archive		archive
crates		crates
scripts		scripts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SECURITY.md		SECURITY.md
mcp.example.json		mcp.example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

webpipe

Cursor MCP setup

Default tool surface (zero keys required)

Tool cookbook

`search_evidence` (main evidence tool)

`web_extract` (single URL)

`arxiv` (search or enrich)

`web_perplexity` (when key is configured)

Install

Quick test

Privacy modes

Optional env vars

CLI (no Cursor needed)

Workspace layout

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

arclabs561/webpipe

Folders and files

Latest commit

History

Repository files navigation

webpipe

Cursor MCP setup

Default tool surface (zero keys required)

Tool cookbook

search_evidence (main evidence tool)

web_extract (single URL)

arxiv (search or enrich)

web_perplexity (when key is configured)

Install

Quick test

Privacy modes

Optional env vars

CLI (no Cursor needed)

Workspace layout

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`search_evidence` (main evidence tool)

`web_extract` (single URL)

`arxiv` (search or enrich)

`web_perplexity` (when key is configured)

Packages