A Python-based command-line interface for Firecrawl with full enterprise proxy support.
The official Firecrawl Node.js CLI has known issues with enterprise proxy configurations. This Python CLI is a drop-in replacement that properly respects HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables.
Enterprise environments often require proxy configuration for external HTTP requests. While Node.js supports proxies, many CLI tools built on Node don't properly handle proxy environment variables, requiring additional tools like proxychains or custom configuration.
This Python CLI uses httpx with built-in proxy support that automatically detects and uses standard proxy environment variables:
export HTTPS_PROXY="http://proxy.company.com:8080"
firecrawl scrape https://example.com # Works seamlessly behind proxy| Feature | Node CLI | Python CLI |
|---|---|---|
| Web Scraping | ✅ | ✅ |
| Website Crawling | ✅ | ✅ |
| URL Mapping | ✅ | ✅ |
| Web Search | ✅ | ✅ |
| Batch Operations | ✅ | ✅ |
HTTP_PROXY support |
❌ | ✅ |
HTTPS_PROXY support |
❌ | ✅ |
NO_PROXY support |
❌ | ✅ |
| Proxy Authentication | ❌ | ✅ |
- Enterprise Proxy Support: Respects
HTTP_PROXY,HTTPS_PROXY, andNO_PROXYenvironment variables - Multiple Output Formats: Markdown, HTML, JSON, screenshots, links, and more
- Web Scraping: Extract clean data from any URL
- Website Crawling: Recursively crawl and discover pages
- URL Mapping: Fast website URL discovery
- Web Search: Search and optionally scrape results
- Batch Operations: Process multiple URLs at once
- uvx Compatible: Run without installation using
uvx
# Run without installing
uvx firecrawl-cli scrape https://example.com
# With specific version
uvx firecrawl-cli@1.0.0 scrape https://example.comuv tool install firecrawl-clipip install firecrawl-cli# Interactive login
firecrawl login
# Or set environment variable
export FIRECRAWL_API_KEY="fc-xxxxx"# Quick scrape (default: markdown)
firecrawl scrape https://example.com
# Multiple formats
firecrawl scrape https://example.com --format markdown,html,links
# Save to file
firecrawl scrape https://example.com --output result.mdThis CLI properly supports enterprise proxy settings through environment variables:
# Set proxy for all HTTP requests
export HTTP_PROXY="http://proxy.company.com:8080"
export HTTPS_PROXY="https://proxy.company.com:8080"
# With authentication
export HTTPS_PROXY="http://user:pass@proxy.company.com:8080"
# Bypass proxy for certain hosts
export NO_PROXY="localhost,127.0.0.1,internal.company.com"Proxy configuration is loaded in the following priority:
- Command-line
--proxyoption - Environment variables (
HTTPS_PROXY,HTTP_PROXY) - Configuration file settings
- System defaults
firecrawl statusThis will show whether a proxy is configured and its URL (with credentials masked).
Extract content from a URL in various formats.
firecrawl scrape <URL> [OPTIONS]
Options:
--format, -f Output format(s): markdown, html, rawHtml, links,
images, screenshot, summary, json, branding
--only-main-content Extract only main content (default: true)
--wait-for Wait time in milliseconds
--screenshot Take a screenshot
--max-age Maximum age of cached content in ms
--output, -o Output file path
--json Output as JSON
--pretty Pretty print JSONExamples:
# Basic scrape
firecrawl scrape https://example.com
# HTML output
firecrawl scrape https://example.com --format html
# Multiple formats with JSON output
firecrawl scrape https://example.com --format markdown,links --json --pretty
# Screenshot
firecrawl scrape https://example.com --format screenshot --output screenshot.pngRecursively crawl a website.
firecrawl crawl <URL> [OPTIONS]
Options:
--wait Wait for crawl to complete
--limit Maximum pages to crawl
--max-depth Maximum crawl depth
--exclude-paths Comma-separated paths to exclude
--include-paths Comma-separated paths to include
--sitemap Sitemap handling: include, skip
--output, -o Output file path
--pretty Pretty print JSONExamples:
# Crawl with limit
firecrawl crawl https://example.com --limit 10
# Wait for completion
firecrawl crawl https://example.com --wait --limit 100
# Check crawl status
firecrawl crawl JOB_ID --statusDiscover URLs on a website.
firecrawl map <URL> [OPTIONS]
Options:
--limit Maximum URLs to discover
--search Search query to filter URLs
--sitemap Sitemap handling: only, include, skip
--include-subdomains
--output, -o Output file pathExamples:
firecrawl map https://example.com
firecrawl map https://example.com --limit 100 --search "blog"Search the web with optional result scraping.
firecrawl search <QUERY> [OPTIONS]
Options:
--limit Maximum results (default: 5)
--sources Comma-separated: web, images, news
--categories Comma-separated: github, research, pdf
--location Geo-targeting location
--country ISO country code (default: US)
--scrape Enable scraping of results
--scrape-formats Formats for scraped contentExamples:
firecrawl search "python web scraping"
firecrawl search "firecrawl" --limit 10 --scrapeScrape multiple URLs at once.
firecrawl batch <URL>... [OPTIONS]
Options:
--format, -f Output format (default: markdown)Examples:
firecrawl batch https://example.com https://example2.com
firecrawl batch https://site1.com https://site2.com --format html# Login and save credentials
firecrawl login
# Login with API key directly
firecrawl login --api-key fc-xxxxx
# View configuration
firecrawl config --view
# Logout (clear credentials)
firecrawl logout
# Check status
firecrawl statusConfiguration is stored in platform-specific locations:
- Linux:
~/.config/firecrawl/config.json - macOS:
~/Library/Application Support/firecrawl/config.json - Windows:
%APPDATA%\firecrawl\config.json
| Variable | Description |
|---|---|
FIRECRAWL_API_KEY |
Your Firecrawl API key |
FIRECRAWL_API_URL |
Custom API URL (optional) |
HTTP_PROXY |
HTTP proxy URL |
HTTPS_PROXY |
HTTPS proxy URL |
NO_PROXY |
Comma-separated hosts to bypass proxy |
markdown- Clean markdown text (default)html- Clean HTMLrawHtml- Raw HTML without processinglinks- Extracted links from the pageimages- Extracted image URLsscreenshot- Base64-encoded screenshotsummary- Page summaryjson- Structured data extractionbranding- Brand identity informationchangeTracking- Content change tracking
# Pretty output (default, human-readable)
firecrawl scrape https://example.com
# JSON output (for programmatic use)
firecrawl scrape https://example.com --json
# Pretty JSON output
firecrawl scrape https://example.com --json --pretty
# Save to file
firecrawl scrape https://example.com --output result.json| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Invalid arguments |
| 3 | Authentication error |
| 4 | API error |
| 5 | Network error |
| 130 | Interrupted (Ctrl+C) |
# Clone repository
git clone https://github.com/socamalo/firecrawl-cli-python.git
cd firecrawl-cli-python
# Install with uv
uv sync --dev
# Run in development mode
uv run firecrawl scrape https://example.com# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=firecrawl_cli# Format code
uv run ruff format .
# Check linting
uv run ruff check .
# Type checking
uv run mypy src/firecrawl_cliMIT License - see LICENSE for details.
Contributions are welcome! Please read our Contributing Guide for details.
- Documentation: https://docs.firecrawl.dev/cli
- Issues: https://github.com/socamalo/firecrawl-cli-python/issues
- API Docs: https://docs.firecrawl.dev/api-reference