Multimodal AI Chatbot is a production-grade AI platform that orchestrates multiple LLM providers into a single, high-availability interface. By implementing a multi-tier failover architecture, it ensures 100% service availability even when primary APIs encounter rate limits or outages.
Traditional AI apps are fragile; they rely on a single API. Multimodal AI Chatbot solves this by implementing an intelligent orchestration layer that automatically routes between Gemini, Groq, and Mistral, while managing a crowdsourced AI Horde vision engine with automated fallbacks.
- Frontend: React 19, TypeScript, Vite, Tailwind CSS
- Backend: FastAPI (Python 3.11), Uvicorn
- Real-Time: WebSockets (Socket.io) for asynchronous task streaming
- AI Orchestration: Gemini 2.0 Flash (with Function Calling), Groq (Llama 3.3), Mistral AI
- Database & RAG: MongoDB Atlas (Vector Search), Beanie ODM
- Security: Bcrypt Password Hashing, OAuth2.0 (Google/GitHub), SMTP OTP Verification, JWT
The following flowchart represents how Multimodal AI Chatbot handles a user request from intent detection to final fulfillment:
graph TD
A[User Message] --> B[WebSocket Session Creation]
B --> C{Intent Detection via Small LLM}
C -- "IMAGE" --> D[AI Horde Vision Engine]
D -- "Success < 2min" --> E[Display Image]
D -- "Timeout/Fail" --> F[Pollinations CDN Fallback]
F --> E
C -- "COMPLEX" --> G[Hybrid RAG Pipeline]
G --> H[MongoDB Vector Search]
G --> I[Gemini Function Calling]
I -- "Web Search Needed" --> J[Parallel SerpAPI: Google + DuckDuckGo]
J --> K[Primary: Gemini 2.0]
H --> K
K -- "Error 429/500" --> L[Backup: Groq Llama 3.3]
L -- "Error" --> M[Safety: Mistral AI]
K --> N[Stream to Frontend]
L --> N
M --> N
C -- "SIMPLE" --> O[Direct LLM Response]
O --> N
B --> P[Parallel: Smart Title Generation]
P -- "Gemini β Groq β Mistral" --> Q[Update Chat History]
- Multi-LLM Failover System: Architected a three-tier failover architecture (Gemini β Groq β Mistral) maintaining 100% service availability during API outages
- Intelligent Intent Detection: Employs a high-speed small LLM model to classify user requests into
IMAGE,SIMPLE, orCOMPLEXcategories in <200ms, ensuring low-latency routing - Gemini Function Calling: Leverages native function calling to dynamically decide between web search and image generation workflows
- Contextual Search: Combines MongoDB Vector Embeddings with real-time web data to provide context-aware responses from both uploaded documents and live information
- Parallel Web Search: Integrates SerpAPI to fetch results simultaneously from Google and DuckDuckGo, merging consensus data for factually accurate responses
- Document Processing: Supports multimodal inputs including PDFs and images with vector embedding storage for semantic search
- AI Horde Integration: Custom-built crowdsourced image generation with dynamic polling based on queue position
- Circuit Breaker Pattern: Implements a strict 120s timeout with automatic fallback to Pollinations CDN to prevent server hangs
- Smart Polling: Adaptive polling intervals (30s for queue >50, 5s for queue <10) to stay within rate limits while maintaining responsiveness
- Secure Authentication: Bcrypt password hashing with custom SMTP-based OTP verification workflow
- Social OAuth2.0: Seamless integration with Google and GitHub login providers
- JWT Authorization: Token-based session management with configurable expiration
- WebSocket Streaming: Real-time response streaming with parallel task execution
- Concurrent Processing: Simultaneous handling of intent detection, smart-title generation, and multimodal processing
- Session Management: Automatic chat session creation and history tracking
- Modern UI: Tailwind CSS-powered responsive design with sidebar, navbar, and main content areas
- Chat History Management: Sidebar navigation for previous conversations and new chat creation
- User Profile: Integrated profile management and signout functionality in navbar
The Challenge: Integrating the AI Horde presented a significant reliability hurdle. Unlike centralized paid APIs, crowdsourced workers can drop jobs, queues often exceed 150+ positions, and aggressive polling quickly triggers 429 Too Many Requests errors.
The Solution: I engineered a Dynamic Polling & Safety Lifecycle to manage these variables:
- State-Based Polling: Instead of fixed intervals, the system dynamically checks the
queue_position. If the position is >50, it sleeps for 30s; if it drops below 10, it sleeps for 5s. This stays within rate limits while maintaining responsiveness as the job nears completion. - Strict 120s Circuit Breaker: I implemented a hard wall-clock timeout. If the Horde does not deliver within 2 minutes, the system intercepts the request and injects a high-speed Pollinations CDN fallback, ensuring the user receives a visual result without the server ever hanging.
Follow these instructions to get a local copy up and running.
git clone (project_Clone_url)The backend manages AI orchestration, WebSocket connections, and the Vision Engine.
# Navigate to server directory
cd server
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install all required Python packages
pip install -r requirements.txtThe frontend provides the interactive chat interface with real-time WebSocket streaming.
# Navigate to client directory
cd ../client
# Install Node dependencies
npm installCreate a .env file in the /server folder and populate it with your API keys:
# --- SERVER CONFIG ---
PORT=8000
FRONTEND_URL=http://localhost:5173
ALLOWED_HOSTS=localhost,127.0.0.1,your-app.render.com
# --- AI PROVIDERS (LLMs) ---
# Gemini is primary, Groq is secondary, Mistral is fallback
GOOGLE_API_KEY=your_gemini_api_key_here
GROQ_API_KEY=your_groq_api_key_here
MISTRAL_API_KEY=your_mistral_api_key_here
# --- VISION ENGINE (AI HORDE) ---
# Use '0000000000' for anonymous (slow) or register at stablehorde.net for a free key
AI_HORDE_KEY=0000000000
HF_TOKEN=your_huggingface_token_here
# --- SEARCH & DATA ---
# Get this from serper.dev (free tier available)
SERPER_API_KEY=your_serper_api_key_here
# MongoDB Atlas Connection String with Vector Search enabled
MONGO_URI=mongodb+srv://<user>:<password>@cluster.mongodb.net/omnigen?retryWrites=true&w=majority
# --- SECURITY ---
# Generate a secret using: openssl rand -hex 32
JWT_SECRET=your_super_secret_random_string
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=60
# --- SMTP OTP VERIFICATION ---
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_EMAIL=your_email@gmail.com
SMTP_PASSWORD=your_app_specific_passwordCreate a .env file in the /client folder:
VITE_GOOGLE_CLIENT_ID=your_google_client_ID
VITE_GITHUB_CLIENT_ID=your_github_client_ID
VITE_API_URL="http://127.0.0.1:8000" # or your deployed backend URLOpen two terminal windows to run both services simultaneously:
Terminal 1 (Backend):
cd server
uvicorn main:app --reloadTerminal 2 (Frontend):
cd client
npm run devThe application will be available at http://localhost:5173
| Intent Type | Model Used | Avg. Latency | Fallback Logic |
|---|---|---|---|
| Simple | Small LLM (Intent) | ~150ms | Direct Response |
| Complex | Gemini 2.0 Flash | ~800ms | Groq (Llama 3.3) β Mistral |
| Image | AI Horde | 30s - 120s | Pollinations CDN |
| Web Search | SerpAPI (Parallel) | ~500ms | N/A |
| Title Gen | Gemini β Groq β Mistral | ~300ms | Multi-tier Fallback |
This project implements a resilient orchestration system to ensure 100% service availability:
- Tier 1 (Primary): Google Gemini 2.0 Flash (High reasoning, multimodal, function calling)
- Tier 2 (Latency Fallback): Groq Llama 3.3 (Triggered if Gemini returns 429/500 errors)
- Tier 3 (Safety Fallback): Mistral AI (Final fallback if all primary providers fail)
This architecture applies to both chat responses and smart title generation, ensuring continuous operation even during provider outages.
- User sends message β WebSocket session created
- Session initialization β Chat history retrieved
- Parallel Execution:
- Intent detection (small LLM)
- Smart title generation (Gemini β Groq β Mistral)
- RAG vector search in MongoDB
- Based on intent:
- Image: AI Horde (with fallback)
- Complex: Gemini Function Calling β Web search if needed β LLM response with failover
- Simple: Direct LLM response
- Stream response to frontend via WebSocket
multimodal-ai-chatbot/
βββ client/ # React + TypeScript + Vite frontend
β βββ src/
β β βββ components/ # UI components (Sidebar, Navbar, Chat)
β β βββ contexts/ # WebSocket and Auth contexts
β β βββ pages/ # Authentication and Chat pages
β βββ .env
βββ server/ # FastAPI backend
β βββ routes/ # API endpoints
β βββ services/ # AI orchestration logic
β βββ models/ # MongoDB models
β βββ .env
βββ README.md
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
- Live Demo: ai-chatbot-frontend-yq89.onrender.com
- Report Issues: GitHub Issues
Built with β€οΈ using React, TypeScript, FastAPI, and MongoDB




