RAG System

A Retrieval-Augmented Generation (RAG) system that combines document retrieval with language model generation using LMStudio models and LangGraph orchestration.

Features

Document Processing: Support for PDF, TXT, Markdown, Word (.docx), and CSV files
Vector Storage: Qdrant integration for efficient similarity search
Local LLM: LMStudio integration for privacy-preserving generation
Workflow Orchestration: LangGraph-based pipeline management
Configurable: Flexible configuration system with environment variable support
Modular Architecture: Clean separation of concerns for maintainability

Quick Start

Prerequisites

Python 3.8 or higher
Docker (for Qdrant vector database)
LMStudio (for local LLM inference)

Installation

1. Set up Qdrant Vector Database

Run Qdrant using Docker with persistent storage:

# Create a directory for Qdrant data persistence
mkdir qdrant_storage

# Run Qdrant with Docker (with data persistence)
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -p 6334:6334 \
  -v ./qdrant_storage:/qdrant/storage:z \
  qdrant/qdrant

Verify Qdrant is running:

curl http://localhost:6333/health
# Should return: {"title":"qdrant - vector search engine","version":"1.x.x"}

2. Install and Configure LMStudio

Download and Install LMStudio:

Visit LMStudio website
Download LMStudio for your operating system (Windows/macOS/Linux)
Run the installer and follow the setup wizard

Download the Required Models:

This project uses Google's Gemma 3 4B Instruct model. Download it using the CLI:

# Download the specific models used in this project
lms get unsloth/gemma-3-4b-it-GGUF

lms get second-state/All-MiniLM-L6-v2-Embedding-GGUF

# Verify models were downloaded
lms ls

Start the LMStudio Server via CLI:

# Load the model into memory
lms load unsloth/gemma-3-4b-it-GGUF

# Start the server on default port (1234)
lms server start

# Or start with specific configuration
lms server start --port 1234 --cors

# Check server status
lms server status

# View loaded models
lms ps

# Stop the server
lms server stop

3. Install Python Dependencies

Clone the repository:

git clone <repository-url>
cd <repository>

Install the RAG CLI system-wide using pipx (recommended):

# Install pipx if not already installed
# On macOS:
brew install pipx

# On other systems, see: https://pipx.pypa.io/stable/installation/

# Install the RAG CLI
pipx install .

# Ensure pipx is in your PATH
pipx ensurepath

# Restart your terminal or source your shell config
source ~/.zshrc  # or ~/.bashrc

Alternative: Virtual Environment Installation

If you prefer using a virtual environment:

# Create a virtual environment
python -m venv rag-env

# Activate it
# On macOS/Linux:
source rag-env/bin/activate
# On Windows:
# rag-env\Scripts\activate

# Install dependencies and package
pip install -r requirements.txt
pip install -e .

Configuration

Configuration Setup

The system uses a unified configuration file with performance optimizations built-in:

# The main configuration file is already optimized for performance
config/rag_config.yaml

Edit the configuration file to match your setup:

# config/rag_config.yaml
qdrant_host: "localhost"
qdrant_port: 6333
collection_name: "documents"

lmstudio_endpoint: "http://localhost:1234"
model_name: "unsloth/gemma-3-4b-it-GGUF"  # The specific model used in this project

embedding_model: "sentence-transformers/all-MiniLM-L6-v2"

# Performance optimizations are already configured
embedding_batch_size: 64          # Optimized for better GPU utilization
retrieval_enable_cache: true      # Enable query caching
qdrant_prefer_grpc: true          # Use gRPC for better performance

Troubleshooting Setup

If Qdrant won't start:

# Check if port is in use
# On macOS/Linux:
netstat -an | grep :6333
# On Windows:
netstat -an | findstr :6333

# Stop existing container
docker stop qdrant
docker rm qdrant

# Restart with fresh container
docker run -d --name qdrant -p 6333:6333 -p 6334:6334 -v ./qdrant_storage:/qdrant/storage:z qdrant/qdrant

If LMStudio connection fails:

Ensure LMStudio server is running:

lms server status
lms ps  # Check if model is loaded

Check if the correct model is loaded:

lms ls  # List available models
lms load unsloth/gemma-3-4b-it-GGUF  # Load the required model

Restart the server if needed:

lms server stop
lms server start --port 1234 --cors

Verify model endpoint:
```
curl http://localhost:1234/v1/models
```

Usage

Command Line Interface

# Interactive mode
rag-cli

# Single query
rag-cli query "What is machine learning?"

# Batch processing
rag-cli batch queries.txt --output results.json

# Custom configuration
rag-cli --config production_config.yaml query "Your question"

Docker Management Commands

# Start Qdrant
docker start qdrant

# Stop Qdrant
docker stop qdrant

# View Qdrant logs
docker logs qdrant

# Remove Qdrant container (keeps data in qdrant_storage/)
docker rm qdrant

# Backup Qdrant data
# On macOS/Linux:
cp -r qdrant_storage qdrant_backup
# On Windows:
xcopy qdrant_storage qdrant_backup /E /I

# Restore Qdrant data
# On macOS/Linux:
cp -r qdrant_backup/* qdrant_storage/
# On Windows:
xcopy qdrant_backup qdrant_storage /E /I /Y

Project Structure

rag_system/
├── __init__.py              # Package initialization
├── config.py                # Configuration management
├── models.py                # Data models
├── logging_config.py        # Logging setup
├── document_processing/     # Document ingestion and processing
├── storage/                 # Vector database integration
├── retrieval/              # Query processing and retrieval
├── generation/             # LMStudio integration and generation
├── orchestration/          # LangGraph workflow management
└── interfaces/             # CLI and API interfaces

Configuration

The system uses a unified configuration file that includes both basic settings and performance optimizations:

Key Configuration Sections:

Qdrant Settings: Vector database connection and performance settings
LMStudio Settings: Local LLM configuration with streaming and connection pooling
Embedding Settings: Model configuration with GPU optimization and batch processing
Retrieval Settings: Search parameters with caching and concurrency limits
Document Processing: File handling with streaming and concurrent processing
Performance Settings: Memory management, async configuration, and optimization flags

Configuration Methods

The system supports configuration through:

YAML file: config/rag_config.yaml (recommended)
Environment variables: Prefix with RAG_ (e.g., RAG_QDRANT_HOST)
Programmatic configuration: Use the RAGConfig class directly

Performance Optimizations Included

The unified configuration includes built-in performance optimizations:

Embedding batch size: Increased to 64 for better GPU utilization
Query caching: LRU cache with 128 query capacity
gRPC connections: Enabled for Qdrant for better performance
Concurrent processing: Optimized limits for file processing and searches
Memory management: Garbage collection hints and monitoring thresholds

Development

Installation Methods

The RAG CLI can be installed in several ways:

Method 1: pipx (Recommended for CLI tools)

# Install pipx if not already installed
# On macOS:
brew install pipx
# On other systems: https://pipx.pypa.io/stable/installation/

# Install RAG CLI
pipx install .

# Ensure pipx is in PATH
pipx ensurepath

# Restart terminal or source shell config
source ~/.zshrc  # or ~/.bashrc

Method 2: Virtual Environment

python -m venv rag-env
source rag-env/bin/activate  # On Windows: rag-env\Scripts\activate
pip install -e .

Method 3: User Installation

pip install --user .

Note: On modern Python installations (following PEP 668), system-wide pip installation may be restricted. Use pipx or virtual environments instead.

Setup Development Environment

# Install development dependencies
pip install -r requirements-dev.txt

Code Quality Checks

Before committing code, run quality checks:

# Run all quality checks (formatting, linting, type checking, tests)
python pre-commit-check.py

# Auto-fix formatting and import issues
python pre-commit-check.py --fix

# Quick check without tests
python pre-commit-check.py --skip-tests

# Show detailed output
python pre-commit-check.py --verbose

See docs/QUALITY_CHECKS.md for detailed information.

Performance Benchmarking

The system includes comprehensive benchmarking tools to measure and validate performance improvements:

Available Benchmark Scripts

Quick Benchmark (quick_benchmark.py)
- Fast performance test (< 2 minutes)
- Tests core functionality and optimizations
- Provides immediate performance assessment
```
python quick_benchmark.py
```
Comprehensive Benchmark (benchmark_rag_performance.py)
- Complete performance analysis (5-10 minutes)
- Tests all system components
- Detailed metrics and system monitoring
- Memory usage analysis
```
python benchmark_rag_performance.py
```

Benchmark Results

The benchmarks test:

Embedding Performance: Single vs batch processing, throughput metrics
Retrieval Performance: Query response times, result quality
Cache Effectiveness: Hit/miss ratios, speedup factors
Batch Operations: Concurrent vs sequential processing
Memory Usage: Peak usage, cleanup efficiency
End-to-End Performance: Complete RAG pipeline timing

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=rag_system --cov-report=html

# Run specific test file
pytest tests/test_config.py -v

# Run specific test
pytest tests/test_config.py::TestRAGConfig::test_default_config -v

Code Style

# Format code
ruff format rag_system/ tests/

# Lint code
ruff check rag_system/ tests/

# Auto-fix linting issues
ruff check --fix rag_system/ tests/

# Type checking
pyright rag_system/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG System

Features

Quick Start

Prerequisites

Installation

1. Set up Qdrant Vector Database

2. Install and Configure LMStudio

3. Install Python Dependencies

Configuration

Configuration Setup

Troubleshooting Setup

Usage

Command Line Interface

Docker Management Commands

Project Structure

Configuration

Configuration Methods

Performance Optimizations Included

Development

Installation Methods

Method 1: pipx (Recommended for CLI tools)

Method 2: Virtual Environment

Method 3: User Installation

Setup Development Environment

Code Quality Checks

Performance Benchmarking

Available Benchmark Scripts

Benchmark Results

Running Tests

Code Style

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
config		config
docs		docs
logs		logs
qdrant_storage		qdrant_storage
rag_system		rag_system
tests		tests
.gitignore		.gitignore
README.md		README.md
benchmark_rag_performance.py		benchmark_rag_performance.py
coverage.json		coverage.json
design.md		design.md
pre-commit-check.py		pre-commit-check.py
pyproject.toml		pyproject.toml
pyright_output.json		pyright_output.json
quick_benchmark.py		quick_benchmark.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

am2998/rag-cli

Folders and files

Latest commit

History

Repository files navigation

RAG System

Features

Quick Start

Prerequisites

Installation

1. Set up Qdrant Vector Database

2. Install and Configure LMStudio

3. Install Python Dependencies

Configuration

Configuration Setup

Troubleshooting Setup

Usage

Command Line Interface

Docker Management Commands

Project Structure

Configuration

Configuration Methods

Performance Optimizations Included

Development

Installation Methods

Method 1: pipx (Recommended for CLI tools)

Method 2: Virtual Environment

Method 3: User Installation

Setup Development Environment

Code Quality Checks

Performance Benchmarking

Available Benchmark Scripts

Benchmark Results

Running Tests

Code Style

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages