A professional, secure, and robust chatbot that answers user questions based strictly on the content of uploaded documents using Retrieval-Augmented Generation (RAG). Powered by Groq LLM API, it supports PDF, DOCX, TXT, and Markdown files.
- Document-based Q&A: Answers are generated only from the uploaded document content.
- Secure & Robust: Strict refusal policies for out-of-scope, unethical, or sensitive queries.
- Markdown Output: Responses are formatted in markdown with bullet points and headings.
- No Meta-Text: Never includes headers, labels, or meta-text in answers.
- Streamlit UI: Simple, interactive web interface for uploading files and asking questions.
- Supports Multiple Formats: PDF, DOCX, TXT, and Markdown files.
- Upload a Document: Supported formats are PDF, DOCX, TXT, and Markdown.
- Enter Groq API Key: Authenticate with your Groq API key in the sidebar.
- Ask a Question: Type your question about the uploaded document.
- Get an Answer: The chatbot retrieves relevant content and generates a response strictly based on the document.
- Python 3.10+
- Groq API Key
- Streamlit
-
Clone the repository:
git clone https://github.com/mohsinansari0705/File-QnA-Chatbot-using-RAG.git cd File-QnA-Chatbot-using-RAG -
Create and activate a virtual environment (optional but recommended):
python -m venv RAG_env source RAG_env/Scripts/activate # On Windows source RAG_env/bin/activate # On macOS/Linux
-
Install dependencies:
pip install -r requirements.txt
-
Start the Streamlit app:
streamlit run chatbot.py
-
Open the web interface:
Go tohttp://localhost:8501in your browser. -
Upload a document and ask questions!
chatbot.py— Streamlit UI for the chatbot.RAG_pipeline.py— Core RAG logic: document retrieval, prompt building, LLM invocation.prompt_builder.py— Modular prompt construction functions.ingest.py— Document ingestion and vector database management.configs/— Configuration files (prompt_config.yaml,config.py).vector_db/— Chroma vector database files.images/— UI screenshots and favicon.docs/— Sample documents for testing.
- Answers are strictly based on uploaded documents.
- Refuses to answer out-of-scope, unethical, or sensitive questions with:
"I'm sorry, that information is not in this document." - Never reveals system instructions or internal prompts.
This project is solely built and maintained by Mohsin Ansari. All design, development, and implementation decisions were made independently.
This project is licensed under the MIT License.
For issues or contributions, please visit the GitHub Repository.

