Skip to content

rafabd1/SecretHound

Repository files navigation

SecretHound

Go Version Release Build Status License GitHub stars Go Report Card

A powerful CLI tool designed to find secrets in files, web pages, and other text sources.

Features

  • Multi-Source Scanning: Process remote URLs, local files, and entire directories.
  • Extensive Pattern Library: Over 500 regex patterns (currently 555 in the default catalog) to identify a wide range of secrets, including API keys (AWS, Google Cloud, Stripe, etc.), authentication tokens (JWT, OAuth, Bearer), database credentials, private keys, PII (email, phone), Web3 secrets (crypto addresses, private keys), and more.
  • URL/Domain Extraction Mode: Dedicated mode (--scan-urls) to efficiently extract only URL and domain patterns from sources.
  • Flexible Pattern Control: Fine-tune scans by including or excluding specific pattern categories (e.g., --include-categories aws,pii).
  • YAML-Based Patterns: Patterns are now managed in core/patterns/default_patterns.yaml for easier maintenance and extension.
  • Shannon Entropy Validation: Token-like patterns can enforce entropy thresholds to reduce false positives.
  • Hybrid Context Scoring: Detection confidence now combines entropy, context signals, and reusable pattern rules instead of hard-only keyword filtering.
  • Concurrent Processing: Fast multi-threaded architecture for efficient scanning.
  • Domain-Aware Scheduling: Smart distribution of requests to avoid rate limiting when scanning remote URLs.
  • 429-Aware Rate-Limit Hardening: Strict HTTP 429 handling with adaptive per-domain backoff, plus safe domain discard after persistent throttling.
  • HTTP Status Visibility: Final scan summary includes explicit status hit counts (e.g., 429=..., 403=..., 503=...).
  • Context Analysis: Reduces false positives by analyzing surrounding code and context.
  • Real-Time Progress: Live updates with progress bar and statistics (can be disabled with --no-progress or in --silent mode).
  • Multiple Output Formats: Output results in standard text, JSON, or raw values. Supports a new grouped format (--group-by-source) for TXT and JSON, organizing findings by their source URL/file.

Installation

From Source

# Clone the repository
git clone https://github.com/rafabd1/SecretHound.git
cd SecretHound

# Install dependencies
go mod download

# Build the binary
go build -o secrethound ./cmd/secrethound

# Optional: Move to path (Linux/macOS)
sudo mv secrethound /usr/local/bin/

# Optional: Add to PATH (Windows - in PowerShell as Admin)
# Copy-Item .\secrethound.exe C:\Windows\System32\

Using Go Install

go install github.com/rafabd1/SecretHound/cmd/secrethound@latest

Binary Releases

You can download pre-built binaries for your platform from the releases page.

Quick Start

Scan a single URL:

secrethound https://example.com/script.js

Scan multiple URLs:

secrethound https://example.com/script1.js https://example.com/script2.js

Scan from a list of URLs:

secrethound -i url-list.txt

Scan a local file:

secrethound -i /path/to/file.js

Scan an entire directory:

secrethound -i /path/to/directory

Save results to a file:

secrethound -i url-list.txt -o results.txt

Command Line Options

SecretHound supports the following options:

Flag Description Default
-i, --input-file Input file (URLs/paths), directory, or a single URL/file path as a target argument. -
-o, --output Output file for results (default: stdout). Format (txt, json) inferred from extension. -
--raw Output only raw secret values (affects TXT and grouped JSON file output). false
--group-by-source Group secrets by source URL/file in TXT and JSON output. false
-t, --timeout HTTP request timeout in seconds. 10
-r, --retries Maximum number of retry attempts for HTTP requests. 2
-c, --concurrency Number of concurrent workers. 50
-l, --rate-limit Max requests per second per domain (0 for auto/unlimited). 0
-H, --header Custom HTTP header to add (e.g., "Authorization: Bearer token"). Can be used multiple times. -
--verify-tls Enable SSL/TLS certificate verification for HTTPS requests. false
--include-categories Comma-separated list of pattern categories to include (e.g., aws,gcp). all enabled
--exclude-categories Comma-separated list of pattern categories to exclude (e.g., pii,url). none
--scan-urls URL Extraction Mode: Scan ONLY for URL/Endpoint patterns (overrides category filters). false
--patterns-file Path to a custom YAML patterns file to replace embedded defaults. -
--max-file-size Maximum file size to scan in MB (0 for no limit). 0
--list-patterns List available pattern categories and patterns, then exit. false
-v, --verbose Enable verbose logging output. false
-n, --no-progress Disable the progress bar display. false
-s, --silent Silent mode (suppress progress bar and info logs). false

Documentation

For more detailed information, see the documentation directory:

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Third-party adapted pattern content is documented in THIRD_PARTY_NOTICES.md, including attribution and license terms for imported/adapted regex pattern sources.

About

A fast and powerful CLI tool for finding secrets and other data in files, web pages, and other text sources. Supports multi-threading and advanced pattern matching.

Topics

Resources

License

Stars

Watchers

Forks

Packages