Skip to content

socamalo/firecrawl-cli-python

Repository files navigation

Firecrawl CLI (Python)

Python License PyPI

A Python-based command-line interface for Firecrawl with full enterprise proxy support.

Why This Project?

The official Firecrawl Node.js CLI has known issues with enterprise proxy configurations. This Python CLI is a drop-in replacement that properly respects HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables.

The Problem with Node CLI

Enterprise environments often require proxy configuration for external HTTP requests. While Node.js supports proxies, many CLI tools built on Node don't properly handle proxy environment variables, requiring additional tools like proxychains or custom configuration.

Our Solution

This Python CLI uses httpx with built-in proxy support that automatically detects and uses standard proxy environment variables:

export HTTPS_PROXY="http://proxy.company.com:8080"
firecrawl scrape https://example.com  # Works seamlessly behind proxy

Feature Comparison

Feature Node CLI Python CLI
Web Scraping
Website Crawling
URL Mapping
Web Search
Batch Operations
HTTP_PROXY support
HTTPS_PROXY support
NO_PROXY support
Proxy Authentication

Features

  • Enterprise Proxy Support: Respects HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables
  • Multiple Output Formats: Markdown, HTML, JSON, screenshots, links, and more
  • Web Scraping: Extract clean data from any URL
  • Website Crawling: Recursively crawl and discover pages
  • URL Mapping: Fast website URL discovery
  • Web Search: Search and optionally scrape results
  • Batch Operations: Process multiple URLs at once
  • uvx Compatible: Run without installation using uvx

Installation

Method 1: Using uvx (Recommended for agents)

# Run without installing
uvx firecrawl-cli scrape https://example.com

# With specific version
uvx firecrawl-cli@1.0.0 scrape https://example.com

Method 2: Using uv

uv tool install firecrawl-cli

Method 3: Using pip

pip install firecrawl-cli

Quick Start

1. Set up authentication

# Interactive login
firecrawl login

# Or set environment variable
export FIRECRAWL_API_KEY="fc-xxxxx"

2. Scrape a URL

# Quick scrape (default: markdown)
firecrawl scrape https://example.com

# Multiple formats
firecrawl scrape https://example.com --format markdown,html,links

# Save to file
firecrawl scrape https://example.com --output result.md

Enterprise Proxy Configuration

This CLI properly supports enterprise proxy settings through environment variables:

Environment Variables

# Set proxy for all HTTP requests
export HTTP_PROXY="http://proxy.company.com:8080"
export HTTPS_PROXY="https://proxy.company.com:8080"

# With authentication
export HTTPS_PROXY="http://user:pass@proxy.company.com:8080"

# Bypass proxy for certain hosts
export NO_PROXY="localhost,127.0.0.1,internal.company.com"

Priority Order

Proxy configuration is loaded in the following priority:

  1. Command-line --proxy option
  2. Environment variables (HTTPS_PROXY, HTTP_PROXY)
  3. Configuration file settings
  4. System defaults

Verify Proxy Settings

firecrawl status

This will show whether a proxy is configured and its URL (with credentials masked).

Commands

scrape

Extract content from a URL in various formats.

firecrawl scrape <URL> [OPTIONS]

Options:
  --format, -f         Output format(s): markdown, html, rawHtml, links,
                       images, screenshot, summary, json, branding
  --only-main-content  Extract only main content (default: true)
  --wait-for          Wait time in milliseconds
  --screenshot        Take a screenshot
  --max-age           Maximum age of cached content in ms
  --output, -o        Output file path
  --json              Output as JSON
  --pretty            Pretty print JSON

Examples:

# Basic scrape
firecrawl scrape https://example.com

# HTML output
firecrawl scrape https://example.com --format html

# Multiple formats with JSON output
firecrawl scrape https://example.com --format markdown,links --json --pretty

# Screenshot
firecrawl scrape https://example.com --format screenshot --output screenshot.png

crawl

Recursively crawl a website.

firecrawl crawl <URL> [OPTIONS]

Options:
  --wait              Wait for crawl to complete
  --limit             Maximum pages to crawl
  --max-depth         Maximum crawl depth
  --exclude-paths     Comma-separated paths to exclude
  --include-paths     Comma-separated paths to include
  --sitemap           Sitemap handling: include, skip
  --output, -o        Output file path
  --pretty            Pretty print JSON

Examples:

# Crawl with limit
firecrawl crawl https://example.com --limit 10

# Wait for completion
firecrawl crawl https://example.com --wait --limit 100

# Check crawl status
firecrawl crawl JOB_ID --status

map

Discover URLs on a website.

firecrawl map <URL> [OPTIONS]

Options:
  --limit             Maximum URLs to discover
  --search            Search query to filter URLs
  --sitemap           Sitemap handling: only, include, skip
  --include-subdomains
  --output, -o        Output file path

Examples:

firecrawl map https://example.com
firecrawl map https://example.com --limit 100 --search "blog"

search

Search the web with optional result scraping.

firecrawl search <QUERY> [OPTIONS]

Options:
  --limit             Maximum results (default: 5)
  --sources           Comma-separated: web, images, news
  --categories        Comma-separated: github, research, pdf
  --location          Geo-targeting location
  --country           ISO country code (default: US)
  --scrape            Enable scraping of results
  --scrape-formats    Formats for scraped content

Examples:

firecrawl search "python web scraping"
firecrawl search "firecrawl" --limit 10 --scrape

batch

Scrape multiple URLs at once.

firecrawl batch <URL>... [OPTIONS]

Options:
  --format, -f        Output format (default: markdown)

Examples:

firecrawl batch https://example.com https://example2.com
firecrawl batch https://site1.com https://site2.com --format html

Authentication Commands

# Login and save credentials
firecrawl login

# Login with API key directly
firecrawl login --api-key fc-xxxxx

# View configuration
firecrawl config --view

# Logout (clear credentials)
firecrawl logout

# Check status
firecrawl status

Configuration

Configuration is stored in platform-specific locations:

  • Linux: ~/.config/firecrawl/config.json
  • macOS: ~/Library/Application Support/firecrawl/config.json
  • Windows: %APPDATA%\firecrawl\config.json

Environment Variables

Variable Description
FIRECRAWL_API_KEY Your Firecrawl API key
FIRECRAWL_API_URL Custom API URL (optional)
HTTP_PROXY HTTP proxy URL
HTTPS_PROXY HTTPS proxy URL
NO_PROXY Comma-separated hosts to bypass proxy

Output Formats

Scrape Formats

  • markdown - Clean markdown text (default)
  • html - Clean HTML
  • rawHtml - Raw HTML without processing
  • links - Extracted links from the page
  • images - Extracted image URLs
  • screenshot - Base64-encoded screenshot
  • summary - Page summary
  • json - Structured data extraction
  • branding - Brand identity information
  • changeTracking - Content change tracking

Output Modes

# Pretty output (default, human-readable)
firecrawl scrape https://example.com

# JSON output (for programmatic use)
firecrawl scrape https://example.com --json

# Pretty JSON output
firecrawl scrape https://example.com --json --pretty

# Save to file
firecrawl scrape https://example.com --output result.json

Exit Codes

Code Meaning
0 Success
1 General error
2 Invalid arguments
3 Authentication error
4 API error
5 Network error
130 Interrupted (Ctrl+C)

Development

Setup

# Clone repository
git clone https://github.com/socamalo/firecrawl-cli-python.git
cd firecrawl-cli-python

# Install with uv
uv sync --dev

# Run in development mode
uv run firecrawl scrape https://example.com

Testing

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=firecrawl_cli

Linting

# Format code
uv run ruff format .

# Check linting
uv run ruff check .

# Type checking
uv run mypy src/firecrawl_cli

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Support

About

Python CLI for Firecrawl with enterprise proxy support - drop-in replacement for the Node.js CLI

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages