A powerful Streamlit chatbot that combines Google's Gemini API with RAG capabilities to provide intelligent responses based on your uploaded content. Supports multimodal inputs including text, images, audio, video, and web content.
- Documents: PDF, Word (.docx), text files
- Images: JPG, PNG, GIF, WebP with AI vision analysis
- Audio: MP3, WAV, M4A with automatic transcription
- Video: MP4, AVI, MOV with audio extraction and transcription
- Web Content: Extract content from any webpage
- YouTube: Automatic transcript extraction from YouTube videos
- Multimodal Chat: Text + image queries using Gemini Vision
- RAG System: Intelligent context retrieval from your knowledge base
- Semantic Search: Find relevant information across all uploaded content
- Chat Memory: Persistent conversation history
- Source Attribution: See which documents informed each response
- ChromaDB: Local vector database (default)
- Pinecone: Cloud vector database (optional, requires API key)
# Clone the repository
git clone <repository-url>
cd streamlit-gemini-rag-chatbot
# Install dependencies
pip install -r requirements.txt# Copy the environment template
cp .env.example .env
# Edit .env and add your API keys
GOOGLE_API_KEY=your_gemini_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here # Optional
PINECONE_ENVIRONMENT=your_pinecone_environment_here # Optional- Go to Google AI Studio
- Create a new API key
- Add it to your
.envfile
- Sign up at Pinecone
- Create a new project and get your API key
- Add it to your
.envfile
streamlit run app.py- Files: Use the sidebar to upload documents, images, audio, or video files
- Web URLs: Enter any webpage URL to extract and index its content
- YouTube: Paste YouTube URLs to get automatic transcripts
- Ask questions about your uploaded content
- Upload images directly in the chat for multimodal queries
- Get responses with source attribution
- View chat history and export conversations
- "Summarize the key points from the uploaded PDF"
- "What does this image show?" (with image upload)
- "Compare the information from different documents"
- "What are the main topics discussed in the video?"
├── app.py # Main Streamlit application
├── config/
│ └── settings.py # Configuration and settings
├── utils/
│ ├── gemini_client.py # Gemini API integration
│ ├── vector_store.py # Vector database operations
│ ├── file_processor.py # File processing and content extraction
│ └── chat_memory.py # Chat history management
├── components/
│ ├── sidebar.py # Sidebar UI components
│ └── chat_interface.py # Chat interface components
└── requirements.txt # Python dependencies
- ChromaDB (Default): Local storage, no API key required
- Pinecone: Cloud storage, requires API key but offers better scalability
Edit config/settings.py to customize:
- Maximum file size limits
- Text chunking parameters
- Supported file types
- Embedding dimensions
- Chunk Size: How text is split for processing
- Top-K Retrieval: Number of relevant chunks to retrieve
- Embedding Model: Sentence transformer model for embeddings
-
API Key Errors
- Ensure your Gemini API key is valid and has sufficient quota
- Check that the key is properly set in the
.envfile
-
File Processing Errors
- Large files may take time to process
- Some video formats may require additional codecs
-
Memory Issues
- For large files, consider increasing chunk size
- Use Pinecone for better scalability with large datasets
- Optimize File Sizes: Compress large video/audio files before upload
- Use Specific Queries: More specific questions yield better results
- Regular Cleanup: Clear chat history periodically for better performance
- Update
SUPPORTED_FILE_TYPESinconfig/settings.py - Add processing logic in
utils/file_processor.py - Test with sample files
- Modify the
generate_embeddingsmethod inutils/gemini_client.py - Update the embedding dimension in settings
- Rebuild your vector database
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini API for multimodal AI capabilities
- Streamlit for the amazing web framework
- ChromaDB and Pinecone for vector storage solutions
- OpenAI Whisper for audio transcription
- All the open-source libraries that make this possible
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Review the configuration settings
- Open an issue on GitHub with detailed information about your problem
Happy chatting with your AI assistant! 🚀