Document RAG Application

A full-stack Retrieval-Augmented Generation (RAG) application that allows users to upload documents and query them using natural language. The system processes documents, creates embeddings, and uses AI to provide intelligent answers based on the document content.

🚀 Features

📄 Document Processing

Multi-format Support: Upload PDF, TXT, and DOCX files
Intelligent Chunking: Automatic text segmentation for optimal retrieval
Metadata Extraction: Document information and chunk metadata storage

🤖 AI-Powered Querying

Natural Language Queries: Ask questions in plain English
Context-Aware Responses: AI provides answers based on document content
Source Attribution: Responses include relevant document context

🔧 Performance & Reliability

Rate Limiting: Configurable API rate limiting with Redis backing
Redis Caching: Multi-layer caching for embeddings and query results
Async Processing: High-performance async operations
Retry Logic: Automatic retry with exponential backoff
Health Monitoring: Comprehensive health checks and performance metrics
CORS Support: Cross-origin resource sharing configuration

🏗️ Production Ready

Docker Support: Complete containerization with Docker Compose
Environment Configuration: Flexible configuration management
Database Migrations: Automatic table creation and schema management
Security Middleware: Trusted host validation and security headers

🏛️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│                 │    │                 │    │                 │
│  React Frontend │◄──►│ FastAPI Backend │◄──►│ PostgreSQL+     │
│                 │    │                 │    │ pgvector        │
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                  ┌────▼────┐                  │
    ┌────▼────┐             │ Redis   │             ┌────▼────┐
    │ Vite    │             │ Cache   │             │ Vector  │
    │ Dev     │             │ Layer   │             │ Search  │
    │ Server  │             └─────────┘             │ Engine  │
    └─────────┘                                     └─────────┘

Components

Frontend: React 19+ with Vite for fast development
Backend: FastAPI with async support and comprehensive middleware
Database: PostgreSQL 16+ with pgvector extension for vector storage
Cache Layer: Redis 7+ with three-layer caching strategy (query results, embeddings, similarity search)
Vector Search: pgvector for efficient similarity search and semantic retrieval
Rate Limiting: Redis-backed distributed rate limiting for API protection └─────────┘ └─────────┘ └─────────┘


### Tech Stack

**Frontend:**
- React 19.1+ with hooks
- Vite for build tooling
- TailwindCSS for styling
- Jest for testing

**Backend:**
- FastAPI with async/await
- SQLAlchemy with async support
- Pydantic for data validation
- LangChain for RAG implementation
- OpenAI/HuggingFace for embeddings

**Database:**
- PostgreSQL 16+ with pgvector extension
- Vector similarity search
- Document metadata storage

**Infrastructure:**
- Docker & Docker Compose
- Redis for rate limiting (optional)
- Nginx for production deployment

## 🚦 Quick Start

### Prerequisites
- Python 3.11+
- Node.js 18+
- Docker & Docker Compose
- Git

### 1. Clone the Repository
```bash
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app

2. Environment Setup

# Copy environment template
cp example.env .env

# Edit .env with your configuration
# Required: OPENAI_API_KEY
# Optional: REDIS_URL for enhanced rate limiting

3. Start with Docker Compose (Recommended)

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

The application will be available at:

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Redis: localhost:6379 (for monitoring)

Docker Services Overview:

PostgreSQL: Database with pgvector extension on port 5432
Redis: Cache layer with 512MB memory limit and LRU eviction on port 6379
Backend: FastAPI application with Redis caching enabled on port 8000
Frontend: React development server on port 3000

Redis Configuration in Docker:

# From docker-compose.yml
redis:
  image: redis:7-alpine
  command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
  # Optimized for performance with persistence and memory management

4. Manual Setup (Development)

Backend Setup

cd backend-python

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# For production features (Redis, enhanced rate limiting)
pip install -r requirements-prod.txt

# Start the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd frontend-react

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

Database Setup

# Start PostgreSQL with pgvector
docker run -d \
  --name postgres-rag \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=postgres \
  -p 5432:5432 \
  pgvector/pgvector:pg16

📖 API Documentation

Core Endpoints

Health Check

GET /health

Returns service health status and performance metrics.

Document Ingestion

POST /ingest
Content-Type: multipart/form-data

file: <document.pdf|txt|docx>

Query Documents

POST /query
Content-Type: application/json

{
  "question": "What is this document about?",
  "doc_id": "optional-document-id"
}

List Documents

GET /documents

Select Documents for RAG

POST /select-docs
Content-Type: application/json

{
  "doc_ids": ["doc-id-1", "doc-id-2"]
}

Response Examples

Successful Query Response:

{
  "answer": "This document is about...",
  "context": "Relevant document excerpts..."
}

Error Response:

{
  "detail": "Error message",
  "status_code": 400
}

🧪 Testing

Backend Tests

cd backend-python

# Run all tests
pytest

# Run with coverage
pytest --cov=app

# Run specific test file
pytest tests/test_health_endpoint.py -v

Frontend Tests

cd frontend-react

# Run tests
npm test

# Run with coverage
npm run test:coverage

# Watch mode
npm run test:watch

Current Test Status

✅ Health endpoint: 100% passing
✅ Performance middleware: 100% passing
✅ Document selection: 100% passing
✅ Async optimization utils: 100% passing
⚠️ Query/Ingest services: Require database setup

🔧 Configuration

Environment Variables

Variable	Description	Default	Required
`DATABASE_URL`	PostgreSQL connection string	-	Yes
`OPENAI_API_KEY`	OpenAI API key for embeddings	-	Yes
`REDIS_URL`	Redis connection for caching & rate limiting	`redis://localhost:6379`	No
`REDIS_TTL`	Cache TTL in seconds	`3600`	No
`ENABLE_RATE_LIMITING`	Enable/disable rate limiting	`true`	No
`ENABLE_CACHING`	Enable/disable Redis caching	`true`	No
`MAX_CONCURRENT_REQUESTS`	Max concurrent requests	`20`	No
`QUERY_RATE_LIMIT`	Queries per minute	`10`	No
`INGEST_RATE_LIMIT`	Ingests per 5 minutes	`5`	No

🚀 Redis Caching

The application implements a sophisticated three-layer Redis caching strategy for dramatic performance improvements:

Performance Improvements 🎯

Query Response Time: 99.86% improvement (14.8s → 0.02s for cached queries)
Resource Efficiency: 80-90% reduction in compute usage
Scalability: Support for 10x more concurrent users

Cache Layers 📊

Layer 1: Complete Query Results Cache

What: Full RAG pipeline responses (question + answer + context)
Hit Rate: 60-80% for exact question matches
Performance: 99.9% faster (10-50ms vs 8-15 seconds)

Layer 2: Embedding Cache

What: Vector embeddings for text chunks and queries
Hit Rate: 80-90% for repeated text chunks
Performance: 99.8% faster (5-10ms vs 3-5 seconds)

Layer 3: Similarity Search Cache

What: PostgreSQL vector search results
Hit Rate: 70-85% for similar questions
Performance: 98% faster (5-10ms vs 200-500ms)

Cache Management ⚙️

Memory Usage: 1.1MB used / 512MB allocated (efficient utilization)
TTL Strategy: 1 hour for embeddings, 30 minutes for queries
Eviction Policy: allkeys-lru (automatically removes least recently used)
Persistence: Data survives container restarts
Graceful degradation when Redis unavailable

Redis Monitoring & Verification 🔍

Check Cache Performance:

# View cached keys
docker-compose exec redis redis-cli keys "*"

# Check cache hit/miss statistics
docker-compose exec redis redis-cli info stats

# Monitor memory usage
docker-compose exec redis redis-cli info memory

# View application cache logs
docker-compose logs backend | grep -E "(Cache HIT|Cache MISS|Cached)"

Performance Testing:

# Test query performance (first time - cache miss)
time curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is AI?", "k": 5}'

# Test same query again (cache hit)
time curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is AI?", "k": 5}'

Cache Key Strategy:

query:abc123def456 - Complete query results
embedding:xyz789abc123 - Text embeddings
similarity:def456ghi789 - Vector search results
rate_limit:client_ip - Rate limiting counters

Monitoring:

# Check cache statistics
curl http://localhost:8000/admin/cache/stats

# Clear cache (development only)
curl -X POST http://localhost:8000/admin/cache/clear

See CACHING.md for detailed implementation details.

Performance Tuning

Rate Limits (per IP):

Query: 10 requests/minute
Ingest: 5 requests/5 minutes
Health: 60 requests/minute

Timeouts:

Query: 25 seconds
Ingest: 60 seconds
Default: 30 seconds

Concurrency:

Max concurrent requests: 20
Query operations: 5 concurrent
Ingest operations: 3 concurrent

🚀 Deployment

🐳 Docker Compose Production (Recommended)

1. Production Environment Setup

# Clone the repository
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app

# Create production environment file
cp example.env .env.prod

# Edit production environment
nano .env.prod

Required Environment Variables for Production:

# .env.prod
DATABASE_URL=postgresql+asyncpg://postgres:your_secure_password@db:5432/postgres
PGVECTOR_CONN=postgresql+psycopg2://postgres:your_secure_password@db:5432/postgres
REDIS_URL=redis://redis:6379/0
OPENAI_API_KEY=your_openai_api_key
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token

# Production settings
ENVIRONMENT=production
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
REDIS_TTL=3600
MAX_CONCURRENT_REQUESTS=50
QUERY_RATE_LIMIT=20
INGEST_RATE_LIMIT=10

2. Deploy with Docker Compose

# Start production services
docker-compose --env-file .env.prod up -d

# Check service health
docker-compose ps

# View logs
docker-compose logs -f backend

# Scale backend if needed
docker-compose up -d --scale backend=3

☁️ Cloud Platform Deployment

AWS ECS (Elastic Container Service)

1. Build and Push Images:

# Build production images
docker build -t your-registry/document-rag-backend:latest ./backend-python
docker build -t your-registry/document-rag-frontend:latest ./frontend-react

# Push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-account.dkr.ecr.us-east-1.amazonaws.com
docker tag your-registry/document-rag-backend:latest your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest

2. ECS Task Definition:

{
  "family": "document-rag-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "backend",
      "image": "your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest",
      "portMappings": [{"containerPort": 8000}],
      "environment": [
        {"name": "DATABASE_URL", "value": "postgresql://..."},
        {"name": "REDIS_URL", "value": "redis://..."},
        {"name": "OPENAI_API_KEY", "value": "..."}
      ]
    }
  ]
}

3. Required AWS Resources:

RDS PostgreSQL with pgvector extension
ElastiCache Redis for caching
Application Load Balancer for traffic distribution
ECS Service with auto-scaling
CloudWatch for monitoring

Google Cloud Platform (GCP)

1. Deploy to Cloud Run:

# Build and deploy backend
gcloud builds submit --tag gcr.io/your-project/document-rag-backend ./backend-python
gcloud run deploy document-rag-backend \
  --image gcr.io/your-project/document-rag-backend \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars DATABASE_URL="postgresql://...",REDIS_URL="redis://...",OPENAI_API_KEY="..."

# Deploy frontend
gcloud builds submit --tag gcr.io/your-project/document-rag-frontend ./frontend-react
gcloud run deploy document-rag-frontend \
  --image gcr.io/your-project/document-rag-frontend \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

2. Required GCP Resources:

Cloud SQL PostgreSQL with pgvector
Memorystore Redis for caching
Cloud Run for containerized apps
Cloud Load Balancing for traffic
Cloud Monitoring for observability

Azure Container Instances

# Create resource group
az group create --name document-rag-rg --location eastus

# Deploy container group
az container create \
  --resource-group document-rag-rg \
  --name document-rag-app \
  --image your-registry/document-rag-backend:latest \
  --ports 8000 \
  --environment-variables \
    DATABASE_URL="postgresql://..." \
    REDIS_URL="redis://..." \
    OPENAI_API_KEY="..." \
  --cpu 2 \
  --memory 4

🌐 VPS/Server Deployment

Ubuntu/Debian Server Setup

1. Server Preparation:

# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker and Docker Compose
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Install Nginx
sudo apt install nginx -y

2. Application Deployment:

# Clone and setup
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app

# Create production environment
cp example.env .env.prod
# Edit with your production values

# Start services
docker-compose --env-file .env.prod up -d

# Setup Nginx reverse proxy
sudo nano /etc/nginx/sites-available/document-rag

3. Nginx Configuration:

server {
    listen 80;
    server_name your-domain.com;

    # Frontend
    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Backend API
    location /api/ {
        proxy_pass http://localhost:8000/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Increase timeouts for large file uploads
        proxy_read_timeout 300;
        proxy_connect_timeout 300;
        proxy_send_timeout 300;
    }

    # WebSocket support (if needed)
    location /ws {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

4. SSL Setup with Let's Encrypt:

# Install Certbot
sudo apt install certbot python3-certbot-nginx -y

# Get SSL certificate
sudo certbot --nginx -d your-domain.com

# Enable Nginx
sudo ln -s /etc/nginx/sites-available/document-rag /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx

🔒 Production Security

Environment Security:

# Use secrets management
# AWS: Systems Manager Parameter Store
# GCP: Secret Manager
# Azure: Key Vault

# Example with Docker secrets
echo "your_openai_api_key" | docker secret create openai_api_key -
echo "your_db_password" | docker secret create db_password -

Network Security:

# Setup firewall (UFW on Ubuntu)
sudo ufw allow 22    # SSH
sudo ufw allow 80    # HTTP
sudo ufw allow 443   # HTTPS
sudo ufw deny 8000   # Block direct backend access
sudo ufw deny 5432   # Block direct database access
sudo ufw deny 6379   # Block direct Redis access
sudo ufw enable

Database Security:

-- Create dedicated database user
CREATE USER rag_app WITH PASSWORD 'secure_password';
CREATE DATABASE document_rag OWNER rag_app;
GRANT CONNECT ON DATABASE document_rag TO rag_app;
GRANT USAGE, CREATE ON SCHEMA public TO rag_app;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO rag_app;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO rag_app;

📊 Production Monitoring

Health Checks:

# Application health
curl -f http://localhost:8000/health || exit 1

# Database connectivity
docker-compose exec backend python -c "from app.db.session import engine; print('DB OK')"

# Redis connectivity
docker-compose exec redis redis-cli ping

Monitoring Setup:

# docker-compose.monitoring.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-storage:/var/lib/grafana

volumes:
  grafana-storage:

🔄 CI/CD Pipeline

GitHub Actions Example:

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to server
        uses: appleboy/ssh-action@v0.1.5
        with:
          host: ${{ secrets.HOST }}
          username: ${{ secrets.USERNAME }}
          key: ${{ secrets.SSH_KEY }}
          script: |
            cd /path/to/document-rag-app
            git pull origin main
            docker-compose --env-file .env.prod up -d --build
            docker system prune -f

📋 Production Checklist

Security:

Set strong database passwords
Configure proper CORS origins
Enable HTTPS/SSL certificates
Setup firewall rules
Use environment secrets management
Enable database connection encryption
Configure rate limiting
Setup API authentication (if needed)

Performance:

Setup Redis for caching and rate limiting
Configure database connection pooling
Enable gzip compression
Setup CDN for static assets
Configure auto-scaling
Optimize Docker images

Reliability:

Setup health checks
Configure log aggregation (ELK, Grafana)
Setup monitoring and alerting
Configure backup strategy
Test disaster recovery
Setup multi-region deployment (if needed)

Maintenance:

🔧 Production Environment Variables

# Essential Production Settings
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/dbname
REDIS_URL=redis://redis:6379/0
OPENAI_API_KEY=sk-...
HUGGINGFACEHUB_API_TOKEN=hf_...

# Performance Settings
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
MAX_CONCURRENT_REQUESTS=50
REDIS_TTL=3600

# Security Settings
CORS_ORIGINS=["https://yourdomain.com"]
TRUSTED_HOSTS=["yourdomain.com"]
ENABLE_HTTPS_REDIRECT=true

# Monitoring
LOG_LEVEL=INFO
ENABLE_METRICS=true
SENTRY_DSN=https://...  # For error tracking

🛠️ Development

Project Structure

document-rag-app/
├── backend-python/           # FastAPI backend
│   ├── app/
│   │   ├── main.py          # Application entry point
│   │   ├── routes/          # API endpoints
│   │   ├── services/        # Business logic
│   │   ├── db/              # Database models & sessions
│   │   ├── middleware/      # Performance & security middleware
│   │   └── utils/           # Utility functions
│   ├── tests/               # Test suite
│   ├── requirements.txt     # Basic dependencies
│   ├── requirements-prod.txt # Production dependencies
│   └── Dockerfile
├── frontend-react/          # React frontend
│   ├── src/
│   │   ├── App.jsx         # Main component
│   │   ├── api.js          # API client
│   │   └── assets/         # Static assets
│   ├── public/             # Public assets
│   └── package.json
├── docker-compose.yml      # Multi-service orchestration
└── README.md              # This file

Adding New Features

Backend Endpoint:

# app/routes/new_feature.py
from fastapi import APIRouter

router = APIRouter(prefix="/new-feature")

@router.post("")
async def new_endpoint():
    return {"message": "Hello World"}

Frontend Integration:

// src/api.js
export const newFeatureApi = async (data) => {
  const response = await fetch('/api/new-feature', {
    method: 'POST',
    body: JSON.stringify(data)
  });
  return response.json();
};

Performance Optimization

The application includes several performance optimizations:

Async Processing: All I/O operations use async/await
Connection Pooling: Database connections are pooled
Request Caching: Responses cached with configurable TTL
Rate Limiting: Prevents API abuse
GZIP Compression: Response compression for bandwidth optimization
Middleware Stack: Performance monitoring and timeout handling

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Python: Follow PEP 8, use black for formatting
JavaScript: Use Prettier, follow Airbnb style guide
Commits: Use conventional commit format

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🐛 Troubleshooting

Common Issues

1. ModuleNotFoundError: No module named 'slowapi'

# Install production dependencies
pip install -r requirements-prod.txt
# Or run without enhanced rate limiting
pip install -r requirements.txt

2. Database Connection Failed

# Check PostgreSQL is running
docker ps | grep postgres

# Verify connection string
echo $DATABASE_URL

3. OpenAI API Errors

# Verify API key is set
echo $OPENAI_API_KEY

# Check API quota and billing

4. Rate Limiting Issues

# Disable rate limiting for development
export ENABLE_RATE_LIMITING=false

# Or increase limits in .env file

Performance Issues

Slow Queries:

Check database indexes
Monitor query performance in logs
Consider pagination for large datasets

High Memory Usage:

Adjust chunk size for document processing
Monitor embeddings cache size
Consider document size limits

Getting Help

📧 Issues: GitHub Issues
📖 Documentation: API Docs
🔧 Development: Check TESTING_SUMMARY.md for test results

🙏 Acknowledgments

FastAPI for the amazing web framework
LangChain for RAG implementation
pgvector for vector similarity search
React for the frontend framework
OpenAI for embeddings and language models

Made with ❤️ by Sudarshan Tanwer

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend-python		backend-python
frontend-react		frontend-react
.env.docker		.env.docker
.gitignore		.gitignore
CACHING.md		CACHING.md
DOCKER.md		DOCKER.md
DOCKER_FIX.md		DOCKER_FIX.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
README.md		README.md
TESTING_SUMMARY.md		TESTING_SUMMARY.md
docker-compose.yml		docker-compose.yml
example.env		example.env
init-db.sql		init-db.sql
package-lock.json		package-lock.json
package.json		package.json
performance.env		performance.env
test.txt		test.txt

Folders and files

Latest commit

History

Repository files navigation