AskDocs AI: AI-Powered PDF Q&A Bot

AskDocs AI is an AI-powered chatbot that leverages Hybrid RAG (Retrieval-Augmented Generation) to answer your questions based on the content of uploaded PDFs. It combines semantic vector search with traditional keyword-based search for superior accuracy.

Key Features

Hybrid Search: Combines ChromaDB (semantic) and BM25 (keyword) retrieval.
LLM Powered: High-performance LLM via Groq Cloud.
Async Processing: PDF ingestion and indexing are offloaded to background threads.
Multimodal Support: Optimized for PDF extraction and processing.

Tech Stack

Backend: FastAPI, LangChain (Classic), ChromaDB, Groq Cloud
Frontend: Streamlit
Search Engines: BM25 (Keyword), Vector (Cosine Similarity)
Embeddings: HuggingFace (Sentence Transformers)
Containerization: Docker & Docker Compose

Configuration

Control the behavior of the Hybrid Search by adjusting weights in your .env file or server/config.py:

Variable	Description	Default
`HYBRID_SEARCH_BM25_WEIGHT`	Weight for keyword search (0.0 to 1.0)	`0.5`
`HYBRID_SEARCH_CHROMA_WEIGHT`	Weight for semantic search (0.0 to 1.0)	`0.5`
`GROQ_API_KEY`	Your Groq Cloud API Key	Required

Optimization Features

Split Dependencies: Client and Server have separate requirement files to minimize image sizes.
CPU-Only Optimization: Server image is optimized for CPU-only environments, reducing size from ~12.8GB to ~2.3GB.
Persistent Memory: Uses Docker volumes to persist the ChromaDB vector store and uploaded files.

Setup Instruction

1. Set up environment variables

Create a .env file in the root directory:

GROQ_API_KEY=your_api_key_here

2. Run with Docker (Recommended)

docker-compose up -d --build

Streamlit UI: http://localhost:8501
FastAPI Docs: http://localhost:8000/docs

Alternative: Local Setup

Create and activate a virtual environment

python -m venv venv
.\venv\Scripts\activate  # Windows

Install dependencies

pip install -r requirements.client.txt
pip install -r requirements.server.txt

Run the services

# Backend
python server/main.py

# Frontend
streamlit run client/main.py

Testing & Verification

To verify that the Hybrid Search mechanism and LLM integration are working correctly:

python server/tests/test_hybrid_search.py

This script validates:

Vectorstore connectivity.
BM25 index reconstruction.
Ensemble Retriever initialization.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
client		client
images		images
server		server
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.client		Dockerfile.client
Dockerfile.server		Dockerfile.server
README.md		README.md
compose.yaml		compose.yaml
requirements.client.txt		requirements.client.txt
requirements.server.txt		requirements.server.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AskDocs AI: AI-Powered PDF Q&A Bot

Key Features

Tech Stack

Configuration

Optimization Features

Setup Instruction

1. Set up environment variables

2. Run with Docker (Recommended)

Alternative: Local Setup

Testing & Verification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AskDocs AI: AI-Powered PDF Q&A Bot

Key Features

Tech Stack

Configuration

Optimization Features

Setup Instruction

1. Set up environment variables

2. Run with Docker (Recommended)

Alternative: Local Setup

Testing & Verification

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages