A full-stack Retrieval-Augmented Generation (RAG) application that allows users to upload documents and query them using natural language. The system processes documents, creates embeddings, and uses AI to provide intelligent answers based on the document content.
- Multi-format Support: Upload PDF, TXT, and DOCX files
- Intelligent Chunking: Automatic text segmentation for optimal retrieval
- Metadata Extraction: Document information and chunk metadata storage
- Natural Language Queries: Ask questions in plain English
- Context-Aware Responses: AI provides answers based on document content
- Source Attribution: Responses include relevant document context
- Rate Limiting: Configurable API rate limiting with Redis backing
- Redis Caching: Multi-layer caching for embeddings and query results
- Async Processing: High-performance async operations
- Retry Logic: Automatic retry with exponential backoff
- Health Monitoring: Comprehensive health checks and performance metrics
- CORS Support: Cross-origin resource sharing configuration
- Docker Support: Complete containerization with Docker Compose
- Environment Configuration: Flexible configuration management
- Database Migrations: Automatic table creation and schema management
- Security Middleware: Trusted host validation and security headers
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ React Frontend │◄──►│ FastAPI Backend │◄──►│ PostgreSQL+ │
│ │ │ │ │ pgvector │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌────▼────┐ │
┌────▼────┐ │ Redis │ ┌────▼────┐
│ Vite │ │ Cache │ │ Vector │
│ Dev │ │ Layer │ │ Search │
│ Server │ └─────────┘ │ Engine │
└─────────┘ └─────────┘
- Frontend: React 19+ with Vite for fast development
- Backend: FastAPI with async support and comprehensive middleware
- Database: PostgreSQL 16+ with pgvector extension for vector storage
- Cache Layer: Redis 7+ with three-layer caching strategy (query results, embeddings, similarity search)
- Vector Search: pgvector for efficient similarity search and semantic retrieval
- Rate Limiting: Redis-backed distributed rate limiting for API protection └─────────┘ └─────────┘ └─────────┘
### Tech Stack
**Frontend:**
- React 19.1+ with hooks
- Vite for build tooling
- TailwindCSS for styling
- Jest for testing
**Backend:**
- FastAPI with async/await
- SQLAlchemy with async support
- Pydantic for data validation
- LangChain for RAG implementation
- OpenAI/HuggingFace for embeddings
**Database:**
- PostgreSQL 16+ with pgvector extension
- Vector similarity search
- Document metadata storage
**Infrastructure:**
- Docker & Docker Compose
- Redis for rate limiting (optional)
- Nginx for production deployment
## 🚦 Quick Start
### Prerequisites
- Python 3.11+
- Node.js 18+
- Docker & Docker Compose
- Git
### 1. Clone the Repository
```bash
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app
# Copy environment template
cp example.env .env
# Edit .env with your configuration
# Required: OPENAI_API_KEY
# Optional: REDIS_URL for enhanced rate limiting# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downThe application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Redis: localhost:6379 (for monitoring)
- PostgreSQL: Database with pgvector extension on port 5432
- Redis: Cache layer with 512MB memory limit and LRU eviction on port 6379
- Backend: FastAPI application with Redis caching enabled on port 8000
- Frontend: React development server on port 3000
Redis Configuration in Docker:
# From docker-compose.yml
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
# Optimized for performance with persistence and memory managementcd backend-python
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# For production features (Redis, enhanced rate limiting)
pip install -r requirements-prod.txt
# Start the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend-react
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build# Start PostgreSQL with pgvector
docker run -d \
--name postgres-rag \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=postgres \
-p 5432:5432 \
pgvector/pgvector:pg16GET /healthReturns service health status and performance metrics.
POST /ingest
Content-Type: multipart/form-data
file: <document.pdf|txt|docx>POST /query
Content-Type: application/json
{
"question": "What is this document about?",
"doc_id": "optional-document-id"
}GET /documentsPOST /select-docs
Content-Type: application/json
{
"doc_ids": ["doc-id-1", "doc-id-2"]
}Successful Query Response:
{
"answer": "This document is about...",
"context": "Relevant document excerpts..."
}Error Response:
{
"detail": "Error message",
"status_code": 400
}cd backend-python
# Run all tests
pytest
# Run with coverage
pytest --cov=app
# Run specific test file
pytest tests/test_health_endpoint.py -vcd frontend-react
# Run tests
npm test
# Run with coverage
npm run test:coverage
# Watch mode
npm run test:watch- ✅ Health endpoint: 100% passing
- ✅ Performance middleware: 100% passing
- ✅ Document selection: 100% passing
- ✅ Async optimization utils: 100% passing
⚠️ Query/Ingest services: Require database setup
| Variable | Description | Default | Required |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string | - | Yes |
OPENAI_API_KEY |
OpenAI API key for embeddings | - | Yes |
REDIS_URL |
Redis connection for caching & rate limiting | redis://localhost:6379 |
No |
REDIS_TTL |
Cache TTL in seconds | 3600 |
No |
ENABLE_RATE_LIMITING |
Enable/disable rate limiting | true |
No |
ENABLE_CACHING |
Enable/disable Redis caching | true |
No |
MAX_CONCURRENT_REQUESTS |
Max concurrent requests | 20 |
No |
QUERY_RATE_LIMIT |
Queries per minute | 10 |
No |
INGEST_RATE_LIMIT |
Ingests per 5 minutes | 5 |
No |
The application implements a sophisticated three-layer Redis caching strategy for dramatic performance improvements:
- Query Response Time: 99.86% improvement (14.8s → 0.02s for cached queries)
- Resource Efficiency: 80-90% reduction in compute usage
- Scalability: Support for 10x more concurrent users
Layer 1: Complete Query Results Cache
- What: Full RAG pipeline responses (question + answer + context)
- Hit Rate: 60-80% for exact question matches
- Performance: 99.9% faster (10-50ms vs 8-15 seconds)
Layer 2: Embedding Cache
- What: Vector embeddings for text chunks and queries
- Hit Rate: 80-90% for repeated text chunks
- Performance: 99.8% faster (5-10ms vs 3-5 seconds)
Layer 3: Similarity Search Cache
- What: PostgreSQL vector search results
- Hit Rate: 70-85% for similar questions
- Performance: 98% faster (5-10ms vs 200-500ms)
- Memory Usage: 1.1MB used / 512MB allocated (efficient utilization)
- TTL Strategy: 1 hour for embeddings, 30 minutes for queries
- Eviction Policy:
allkeys-lru(automatically removes least recently used) - Persistence: Data survives container restarts
- Graceful degradation when Redis unavailable
Check Cache Performance:
# View cached keys
docker-compose exec redis redis-cli keys "*"
# Check cache hit/miss statistics
docker-compose exec redis redis-cli info stats
# Monitor memory usage
docker-compose exec redis redis-cli info memory
# View application cache logs
docker-compose logs backend | grep -E "(Cache HIT|Cache MISS|Cached)"Performance Testing:
# Test query performance (first time - cache miss)
time curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is AI?", "k": 5}'
# Test same query again (cache hit)
time curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is AI?", "k": 5}'Cache Key Strategy:
query:abc123def456- Complete query resultsembedding:xyz789abc123- Text embeddingssimilarity:def456ghi789- Vector search resultsrate_limit:client_ip- Rate limiting counters
Monitoring:
# Check cache statistics
curl http://localhost:8000/admin/cache/stats
# Clear cache (development only)
curl -X POST http://localhost:8000/admin/cache/clearSee CACHING.md for detailed implementation details.
Rate Limits (per IP):
- Query: 10 requests/minute
- Ingest: 5 requests/5 minutes
- Health: 60 requests/minute
Timeouts:
- Query: 25 seconds
- Ingest: 60 seconds
- Default: 30 seconds
Concurrency:
- Max concurrent requests: 20
- Query operations: 5 concurrent
- Ingest operations: 3 concurrent
# Clone the repository
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app
# Create production environment file
cp example.env .env.prod
# Edit production environment
nano .env.prodRequired Environment Variables for Production:
# .env.prod
DATABASE_URL=postgresql+asyncpg://postgres:your_secure_password@db:5432/postgres
PGVECTOR_CONN=postgresql+psycopg2://postgres:your_secure_password@db:5432/postgres
REDIS_URL=redis://redis:6379/0
OPENAI_API_KEY=your_openai_api_key
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token
# Production settings
ENVIRONMENT=production
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
REDIS_TTL=3600
MAX_CONCURRENT_REQUESTS=50
QUERY_RATE_LIMIT=20
INGEST_RATE_LIMIT=10# Start production services
docker-compose --env-file .env.prod up -d
# Check service health
docker-compose ps
# View logs
docker-compose logs -f backend
# Scale backend if needed
docker-compose up -d --scale backend=31. Build and Push Images:
# Build production images
docker build -t your-registry/document-rag-backend:latest ./backend-python
docker build -t your-registry/document-rag-frontend:latest ./frontend-react
# Push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-account.dkr.ecr.us-east-1.amazonaws.com
docker tag your-registry/document-rag-backend:latest your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest2. ECS Task Definition:
{
"family": "document-rag-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "backend",
"image": "your-account.dkr.ecr.us-east-1.amazonaws.com/document-rag-backend:latest",
"portMappings": [{"containerPort": 8000}],
"environment": [
{"name": "DATABASE_URL", "value": "postgresql://..."},
{"name": "REDIS_URL", "value": "redis://..."},
{"name": "OPENAI_API_KEY", "value": "..."}
]
}
]
}3. Required AWS Resources:
- RDS PostgreSQL with pgvector extension
- ElastiCache Redis for caching
- Application Load Balancer for traffic distribution
- ECS Service with auto-scaling
- CloudWatch for monitoring
1. Deploy to Cloud Run:
# Build and deploy backend
gcloud builds submit --tag gcr.io/your-project/document-rag-backend ./backend-python
gcloud run deploy document-rag-backend \
--image gcr.io/your-project/document-rag-backend \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars DATABASE_URL="postgresql://...",REDIS_URL="redis://...",OPENAI_API_KEY="..."
# Deploy frontend
gcloud builds submit --tag gcr.io/your-project/document-rag-frontend ./frontend-react
gcloud run deploy document-rag-frontend \
--image gcr.io/your-project/document-rag-frontend \
--platform managed \
--region us-central1 \
--allow-unauthenticated2. Required GCP Resources:
- Cloud SQL PostgreSQL with pgvector
- Memorystore Redis for caching
- Cloud Run for containerized apps
- Cloud Load Balancing for traffic
- Cloud Monitoring for observability
# Create resource group
az group create --name document-rag-rg --location eastus
# Deploy container group
az container create \
--resource-group document-rag-rg \
--name document-rag-app \
--image your-registry/document-rag-backend:latest \
--ports 8000 \
--environment-variables \
DATABASE_URL="postgresql://..." \
REDIS_URL="redis://..." \
OPENAI_API_KEY="..." \
--cpu 2 \
--memory 41. Server Preparation:
# Update system
sudo apt update && sudo apt upgrade -y
# Install Docker and Docker Compose
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# Install Nginx
sudo apt install nginx -y2. Application Deployment:
# Clone and setup
git clone https://github.com/sudarshantanwer/document-rag-app.git
cd document-rag-app
# Create production environment
cp example.env .env.prod
# Edit with your production values
# Start services
docker-compose --env-file .env.prod up -d
# Setup Nginx reverse proxy
sudo nano /etc/nginx/sites-available/document-rag3. Nginx Configuration:
server {
listen 80;
server_name your-domain.com;
# Frontend
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Backend API
location /api/ {
proxy_pass http://localhost:8000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Increase timeouts for large file uploads
proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;
}
# WebSocket support (if needed)
location /ws {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}4. SSL Setup with Let's Encrypt:
# Install Certbot
sudo apt install certbot python3-certbot-nginx -y
# Get SSL certificate
sudo certbot --nginx -d your-domain.com
# Enable Nginx
sudo ln -s /etc/nginx/sites-available/document-rag /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx# Use secrets management
# AWS: Systems Manager Parameter Store
# GCP: Secret Manager
# Azure: Key Vault
# Example with Docker secrets
echo "your_openai_api_key" | docker secret create openai_api_key -
echo "your_db_password" | docker secret create db_password -# Setup firewall (UFW on Ubuntu)
sudo ufw allow 22 # SSH
sudo ufw allow 80 # HTTP
sudo ufw allow 443 # HTTPS
sudo ufw deny 8000 # Block direct backend access
sudo ufw deny 5432 # Block direct database access
sudo ufw deny 6379 # Block direct Redis access
sudo ufw enable-- Create dedicated database user
CREATE USER rag_app WITH PASSWORD 'secure_password';
CREATE DATABASE document_rag OWNER rag_app;
GRANT CONNECT ON DATABASE document_rag TO rag_app;
GRANT USAGE, CREATE ON SCHEMA public TO rag_app;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO rag_app;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO rag_app;# Application health
curl -f http://localhost:8000/health || exit 1
# Database connectivity
docker-compose exec backend python -c "from app.db.session import engine; print('DB OK')"
# Redis connectivity
docker-compose exec redis redis-cli ping# docker-compose.monitoring.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-storage:/var/lib/grafana
volumes:
grafana-storage:# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy to server
uses: appleboy/ssh-action@v0.1.5
with:
host: ${{ secrets.HOST }}
username: ${{ secrets.USERNAME }}
key: ${{ secrets.SSH_KEY }}
script: |
cd /path/to/document-rag-app
git pull origin main
docker-compose --env-file .env.prod up -d --build
docker system prune -fSecurity:
- Set strong database passwords
- Configure proper CORS origins
- Enable HTTPS/SSL certificates
- Setup firewall rules
- Use environment secrets management
- Enable database connection encryption
- Configure rate limiting
- Setup API authentication (if needed)
Performance:
- Setup Redis for caching and rate limiting
- Configure database connection pooling
- Enable gzip compression
- Setup CDN for static assets
- Configure auto-scaling
- Optimize Docker images
Reliability:
- Setup health checks
- Configure log aggregation (ELK, Grafana)
- Setup monitoring and alerting
- Configure backup strategy
- Test disaster recovery
- Setup multi-region deployment (if needed)
Maintenance:
- Setup automated updates
- Configure log rotation
- Setup database maintenance jobs
- Monitor resource usage
- Setup performance profiling
# Essential Production Settings
DATABASE_URL=postgresql+asyncpg://user:password@host:5432/dbname
REDIS_URL=redis://redis:6379/0
OPENAI_API_KEY=sk-...
HUGGINGFACEHUB_API_TOKEN=hf_...
# Performance Settings
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
MAX_CONCURRENT_REQUESTS=50
REDIS_TTL=3600
# Security Settings
CORS_ORIGINS=["https://yourdomain.com"]
TRUSTED_HOSTS=["yourdomain.com"]
ENABLE_HTTPS_REDIRECT=true
# Monitoring
LOG_LEVEL=INFO
ENABLE_METRICS=true
SENTRY_DSN=https://... # For error trackingdocument-rag-app/
├── backend-python/ # FastAPI backend
│ ├── app/
│ │ ├── main.py # Application entry point
│ │ ├── routes/ # API endpoints
│ │ ├── services/ # Business logic
│ │ ├── db/ # Database models & sessions
│ │ ├── middleware/ # Performance & security middleware
│ │ └── utils/ # Utility functions
│ ├── tests/ # Test suite
│ ├── requirements.txt # Basic dependencies
│ ├── requirements-prod.txt # Production dependencies
│ └── Dockerfile
├── frontend-react/ # React frontend
│ ├── src/
│ │ ├── App.jsx # Main component
│ │ ├── api.js # API client
│ │ └── assets/ # Static assets
│ ├── public/ # Public assets
│ └── package.json
├── docker-compose.yml # Multi-service orchestration
└── README.md # This file
-
Backend Endpoint:
# app/routes/new_feature.py from fastapi import APIRouter router = APIRouter(prefix="/new-feature") @router.post("") async def new_endpoint(): return {"message": "Hello World"}
-
Frontend Integration:
// src/api.js export const newFeatureApi = async (data) => { const response = await fetch('/api/new-feature', { method: 'POST', body: JSON.stringify(data) }); return response.json(); };
The application includes several performance optimizations:
- Async Processing: All I/O operations use async/await
- Connection Pooling: Database connections are pooled
- Request Caching: Responses cached with configurable TTL
- Rate Limiting: Prevents API abuse
- GZIP Compression: Response compression for bandwidth optimization
- Middleware Stack: Performance monitoring and timeout handling
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Python: Follow PEP 8, use
blackfor formatting - JavaScript: Use Prettier, follow Airbnb style guide
- Commits: Use conventional commit format
This project is licensed under the MIT License - see the LICENSE file for details.
1. ModuleNotFoundError: No module named 'slowapi'
# Install production dependencies
pip install -r requirements-prod.txt
# Or run without enhanced rate limiting
pip install -r requirements.txt2. Database Connection Failed
# Check PostgreSQL is running
docker ps | grep postgres
# Verify connection string
echo $DATABASE_URL3. OpenAI API Errors
# Verify API key is set
echo $OPENAI_API_KEY
# Check API quota and billing4. Rate Limiting Issues
# Disable rate limiting for development
export ENABLE_RATE_LIMITING=false
# Or increase limits in .env fileSlow Queries:
- Check database indexes
- Monitor query performance in logs
- Consider pagination for large datasets
High Memory Usage:
- Adjust chunk size for document processing
- Monitor embeddings cache size
- Consider document size limits
- 📧 Issues: GitHub Issues
- 📖 Documentation: API Docs
- 🔧 Development: Check
TESTING_SUMMARY.mdfor test results
- FastAPI for the amazing web framework
- LangChain for RAG implementation
- pgvector for vector similarity search
- React for the frontend framework
- OpenAI for embeddings and language models
Made with ❤️ by Sudarshan Tanwer