Skip to content

RomaYushchenko/log-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Log Filter

PyPI version Python 3.10+ License: MIT Tests Coverage

High-performance log filtering tool with boolean expression support and multi-threaded processing.

✨ Features

  • πŸ” Boolean Expressions: Search with AND, OR, NOT operators for complex patterns
  • ⚑ Multi-threaded: Parallel processing delivers 5-10x speedup (5,000+ lines/sec)
  • 🏷️ Log Level Normalization: Automatically matches abbreviated levels (Eβ†’ERROR, Wβ†’WARN, etc.)
  • πŸ“Š Statistics: Built-in metrics tracking and performance monitoring
  • πŸ—“οΈ Date/Time Filtering: Native support for date and time range filtering
  • οΏ½ Smart Sorting: Chronological ordering with file pre-sorting and timestamp sorting
  • πŸ“¦ Chunked Output: Automatic file splitting to prevent large output files
  • οΏ½πŸ”§ Flexible Configuration: YAML config files, environment variables, CLI arguments
  • 🐳 Docker Ready: Production-ready containers and Kubernetes manifests
  • πŸ›‘οΈ Type Safe: Full type hints for better IDE support
  • βœ… Production Tested: 706 tests with 89.73% coverage, zero critical vulnerabilities

πŸš€ Quick Start

Installation

From Source (Development/Local)

# Clone the repository
git clone https://github.com/RomaYushchenko/log-filter.git
cd log-filter

# Install in development mode
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

From PyPI (When Published)

pip install log-filter

Check Version

# Display installed version
log-filter --version

Basic Usage

# Search for errors
log-filter "ERROR" /var/log

# Works with abbreviated levels (E, W, I, D, T, F)
# Searches for "ERROR" will match logs with both "ERROR" and "E" levels
log-filter "ERROR" /var/log/production

# Boolean expression
log-filter "ERROR AND database" /var/log

# Complex query
log-filter "(ERROR OR CRITICAL) AND NOT test" /var/log

# Save results
log-filter "ERROR" /var/log -o errors.txt --stats

# Date filtering
log-filter "ERROR" /var/log --after 2024-01-01

# Show statistics
log-filter "ERROR" /var/log --stats

# Disable level normalization (match exact text only)
log-filter "ERROR" /var/log --no-normalize-levels

Example Output

Processing logs from /var/log...
βœ“ app.log (25 matches)
βœ“ system.log (13 matches)
βœ“ database.log (8 matches)

Statistics:
  Files Processed: 127
  Lines Processed: 1,234,567
  Matches Found: 5,432
  Processing Time: 45.67s
  Throughput: 27,024 lines/sec

Docker Quick Start

Using Docker

# Build image
docker build -t log-filter:latest .

# Run on local logs
docker run --rm \
  -v ${PWD}/test-logs:/logs:ro \
  -v ${PWD}/output:/output \
  log-filter:latest \
  ERROR /logs -o /output/errors.txt --stats

Using Docker Compose

# Run with local logs
docker-compose -f docker-compose.local.yml run --rm log-filter-local

# Development mode with live reload
docker-compose -f docker-compose.dev.yml run --rm log-filter-dev

See Docker Deployment Guide for detailed instructions.

πŸ“š Documentation

πŸ’‘ Use Cases

Error Monitoring

# Find all errors from today
log-filter "ERROR" /var/log --after today -o errors-today.txt

# Monitor specific application
log-filter "ERROR AND myapp" /var/log --stats

Database Analysis

# Extract database errors
log-filter "ERROR AND (database OR sql OR connection)" /var/log -o db-errors.txt

# Find slow queries
log-filter "slow query" /var/log/mysql --time-after 09:00 --time-before 17:00

Business Hours Filtering

# Only business hours (9 AM - 5 PM)
log-filter "ERROR" /var/log \
  --time-after 09:00 \
  --time-before 17:00 \
  -o business-hours-errors.txt

Multi-Directory Search

# Search multiple directories
log-filter "ERROR" /var/log/app /var/log/system /var/log/nginx

Log Level Normalization

Production logs often use abbreviated log levels (E, W, I, D) to save space. Log Filter automatically normalizes these abbreviations, allowing you to search using full level names:

# Search for "ERROR" matches both "ERROR" and "E" in logs
log-filter "ERROR" /var/log/production

# Supported abbreviations:
# E β†’ ERROR
# W β†’ WARN (also WARN, WARNING)
# I β†’ INFO  
# D β†’ DEBUG
# T β†’ TRACE
# F β†’ FATAL

# Example: Your production log format
# 2025-01-08 10:00:00.000+0000 E Database connection failed
# 2025-01-08 10:00:01.000+0000 W Connection pool exhausted

# Both will be matched by:
log-filter "ERROR OR WARN" /var/log

# Disable normalization if needed (exact match only)
log-filter "ERROR" /var/log --no-normalize-levels

# Configure in YAML
# processing:
#   normalize_log_levels: true  # default

Sorted Chunked Output

Log Filter can automatically sort results by timestamp and split large output into manageable chunks:

# Automatic file splitting (default: 500 records per file)
log-filter "ERROR" /var/log -o results.log
# Creates: results-001.log, results-002.log, results-003.log, ...

# Custom chunk size (1000 records per file)
log-filter "ERROR" /var/log -o results.log --max-records-per-file 1000

# Custom filename pattern
log-filter "ERROR" /var/log -o results.log \
  --output-pattern "{base}_part{index:02d}{ext}"
# Creates: results_part01.log, results_part02.log, ...

# Disable chunking (single file)
log-filter "ERROR" /var/log -o results.log --max-records-per-file 0

# Sort results by timestamp (chronological order)
log-filter "ERROR" /var/log -o results.log
# Results sorted: oldest β†’ newest

# Disable timestamp sorting
log-filter "ERROR" /var/log -o results.log --no-sort-timestamps

# File pre-sorting (process files in date order)
# Automatically sorts input files like: app-01-15-2026-1.log, app-01-16-2026-1.log
log-filter "ERROR" /var/log -o results.log
# Files processed in chronological order

# Disable file sorting
log-filter "ERROR" /var/log -o results.log --no-sort-files

# Example: Large production log analysis
log-filter "(ERROR OR CRITICAL)" /var/log/production \
  -o critical-errors.log \
  --max-records-per-file 500 \
  --output-pattern "{base}-{index:03d}{ext}"
# Output: critical-errors-001.log (500 records)
#         critical-errors-002.log (500 records)
#         critical-errors-003.log (remaining records)
# All sorted chronologically

Supported Filename Patterns for Pre-sorting

Log Filter recognizes these date patterns in filenames:

  • DD-MM-YYYY-N (e.g., app-15-01-2026-1.log)
  • YYYY-MM-DD-N (e.g., app-2026-01-15-1.log)
  • YYYYMMDD_N (e.g., app_20260115_1.log)
  • Index-only (e.g., app-001.log, app-002.log)

Files are processed in chronological order based on date + index.

πŸ”§ Advanced Configuration

Create config.yaml:

search:
  expression: "ERROR OR CRITICAL"
  ignore_case: false

files:
  path: "/var/log"
  include_patterns:
    - "*.log"
  exclude_patterns:
    - "*.gz"
  max_depth: 3
  max_file_size: 100      # Skip files > 100 MB
  max_record_size: 512    # Skip records > 512 KB

output:
  output_file: "/var/log-filter/errors.txt"
  overwrite: true
  no_path: false          # Include file paths
  highlight: false        # Highlight matches
  stats: true
  verbose: false
  quiet: false
  dry_run: false
  # Chunked output configuration
  max_records_per_file: 500                    # Records per file (0 = unlimited)
  output_file_pattern: "{base}-{index:03d}{ext}"  # Filename template
  sort_by_timestamp: true                      # Sort results chronologically

processing:
  max_workers: 8
  buffer_size: 32768
  encoding: "utf-8"
  normalize_log_levels: true  # Enable level normalization (default)
  sort_input_files: true      # Pre-sort input files by date/index
  debug: false

Run with config:

log-filter --config config.yaml

🐳 Docker Deployment

# Pull image
docker pull log-filter/log-filter:2.0.0

# Run
docker run --rm \
  -v /var/log:/logs:ro \
  -v $(pwd)/output:/output \
  log-filter:2.0.0 \
  "ERROR" "/logs" "-o" "/output/errors.txt" "--stats"

Docker Compose

version: '3.8'
services:
  log-filter:
    image: log-filter:2.0.0
    volumes:
      - /var/log:/logs:ro
      - ./output:/output
    environment:
      - LOG_FILTER_WORKERS=8
    command: ["ERROR", "/logs", "-o", "/output/errors.txt", "--stats"]

☸️ Kubernetes Deployment

apiVersion: batch/v1
kind: CronJob
metadata:
  name: log-filter-hourly
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: log-filter
            image: log-filter:2.0.0
            args: ["--config", "/config/config.yaml"]
            volumeMounts:
              - name: logs
                mountPath: /logs
                readOnly: true
          restartPolicy: OnFailure

πŸ“Š Performance

Workload Throughput Workers Time (1 GB)
Single-threaded 5,000 lines/sec 1 180s
Multi-threaded 40,000 lines/sec 8 25s
High-performance 80,000 lines/sec 16 12s

Scaling: Linear with CPU cores up to 16 workers Memory: ~50-100 MB base + ~10 MB per worker Tested: Up to 100 GB of logs with consistent performance

πŸ› οΈ Development

Setup

# Clone repository
git clone https://github.com/RomaYushchenko/log-filter
cd log-filter

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install with dev dependencies
pip install -e ".[dev]"

Testing

# Run tests
pytest

# With coverage
pytest --cov=log_filter --cov-report=html

# Run specific test
pytest tests/test_parser.py -v

Code Quality

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Type checking
mypy src/

# Linting
pylint src/
flake8 src/

πŸ—οΈ Architecture

log-filter/
β”œβ”€β”€ src/log_filter/
β”‚   β”œβ”€β”€ core/           # Expression parsing & evaluation
β”‚   β”œβ”€β”€ domain/         # Business models & filters
β”‚   β”œβ”€β”€ config/         # Configuration management
β”‚   β”œβ”€β”€ infrastructure/ # File I/O & handlers
β”‚   β”œβ”€β”€ processing/     # Multi-threaded pipeline
β”‚   β”œβ”€β”€ statistics/     # Metrics & reporting
β”‚   └── utils/          # Logging, progress, highlighting
β”œβ”€β”€ tests/              # Comprehensive test suite
└── docs/               # Sphinx documentation

🀝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Links

πŸ“ˆ Project Status

  • Version: 2.0.0
  • Status: Production Ready
  • Python: 3.10+ required
  • Tests: 706 tests, 89.73% coverage
  • Security: Zero critical vulnerabilities
  • Performance: 5,000+ lines/sec (single), 40,000+ (multi-threaded)

πŸ™ Acknowledgments

Developed by Roman Yushchenko with contributions from the community.

Special thanks to all contributors, testers, and users who provided feedback.

πŸ“ž Support


Made with ❀️ by Roman Yushchenko

About

High-performance log filtering tool with boolean expression support

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages