PlugLLM is a powerful, unified Python package that provides a consistent interface for 13+ Large Language Model (LLM) providers. Stop dealing with different SDKs and API formats - use one simple API for all your LLM needs.
- π Unified API - Same interface for all 13+ providers
- π§ Context Memory - Built-in conversation memory with deque (configurable up to 10+ messages)
- π¬ Multiple Methods -
generate(),chat(),ask(),stream()for every use case - π Async Support - Full async/await functionality
- π‘ Streaming - Real-time response streaming
- π Factory Pattern - Easy provider instantiation with LLMFactory
- π Environment Variables - Automatic API key detection
- πͺ Type Hints - Full type annotation support
- π― Production Ready - Comprehensive error handling and timeouts
- π No Vendor Lock-in - Switch providers without code changes
pip install plugllmgit clone https://github.com/firoziya/plugllm.git
cd plugllm
pip install -e .pip install plugllm[dev]from plugllm import ChatOpenAI, Message
# Initialize your LLM
llm = ChatOpenAI(api_key="your-key", model="gpt-4")
# Simple generation
response = llm.generate("What is Python?")
print(response)
# With message history
messages = [
Message.system("You are a helpful assistant"),
Message.user("What is machine learning?")
]
response = llm.generate(messages)
print(response)from plugllm import LLMFactory
# Create any provider with one line
llm = LLMFactory.create("groq", api_key="your-key", model="openai/gpt-oss-20b")
response = llm.generate("Explain AI")
print(response)from plugllm import ChatGroq
llm = ChatGroq(api_key="your-key", model="openai/gpt-oss-20b")
# Simple ask
response = llm.ask("What is Python?")
# With system prompt
response = llm.ask(
"What is Python?",
system_prompt="You are a beginner-friendly teacher. Explain simply."
)
print(response)PlugLLM includes built-in conversation memory using deque for natural conversations:
from plugllm import ChatOpenAI
llm = ChatOpenAI(api_key="your-key", model="gpt-4", max_history=10)
# Have a conversation - it remembers context!
response1 = llm.chat("My name is Alice")
print(f"Assistant: {response1}")
response2 = llm.chat("What's my name?") # Remembers "Alice"
print(f"Assistant: {response2}")
# Multiple independent sessions
llm.chat("I like Python", session_id="user1")
llm.chat("I like Java", session_id="user2")
# Get conversation history
history = llm.get_conversation_history("user1")
print(f"Session history: {history}")from plugllm import ChatGroq
llm = ChatGroq(api_key="your-key", model="openai/gpt-oss-20b")
# Synchronous streaming
for chunk in llm.stream("Tell me a story"):
print(chunk, end="", flush=True)
# Streaming with ask method
for chunk in llm.ask_stream("Count from 1 to 5"):
print(chunk, end="")
# Async streaming
async for chunk in llm.astream("Tell me a joke"):
print(chunk, end="")import asyncio
from plugllm import ChatGemini
async def main():
llm = ChatGemini(api_key="your-key", model="gemini-2.5-flash")
# Async generate
response = await llm.agenerate("What is async programming?")
print(response)
# Async chat with memory
response = await llm.achat("Remember this: 42", session_id="test")
# Async streaming
async for chunk in llm.astream("Tell me a secret"):
print(chunk)
asyncio.run(main())from plugllm import ChatOpenAI
llm = ChatOpenAI(api_key="your-key", model="gpt-4")
# Method chaining for clean code
response = (llm
.with_system("You are a helpful math tutor")
.with_user("What is the square root of 144?")
.with_temperature(0.5)
.with_max_tokens(100)
.call())
print(response)from plugllm import ChatGroq
llm = ChatGroq(api_key="your-key", model="openai/gpt-oss-20b")
# Session 1: Technical discussion
llm.chat("What is Python?", session_id="tech")
llm.chat("What are its main features?", session_id="tech")
# Session 2: Casual chat
llm.chat("I like pizza", session_id="casual")
llm.chat("What do I like?", session_id="casual")
# Manage sessions
history = llm.get_conversation_history("tech")
llm.clear_conversation("casual")
llm.reset_conversation("tech") # Complete resetfrom plugllm import ChatGemini
llm = ChatGemini(api_key="your-key", model="gemini-2.5-flash")
# Set system message for a session
llm.set_system_message(
"You are a pirate. Always respond like a pirate.",
session_id="pirate"
)
response = llm.chat("What is your favorite food?", session_id="pirate")
print(response) # Will respond in pirate style| Provider | Class | Default Model | API Key Env Var |
|---|---|---|---|
| OpenAI | ChatOpenAI |
gpt-5.4 | OPENAI_API_KEY |
| Google Gemini | ChatGemini |
gemini-3-flash-preview | GEMINI_API_KEY |
| Groq | ChatGroq |
llama-3.3-70b-versatile | GROQ_API_KEY |
| Anthropic Claude | ChatClaude |
claude-sonnet-4-6 | ANTHROPIC_API_KEY |
| xAI Grok | ChatGrok |
grok-4.20-reasoning | XAI_API_KEY |
| Mistral AI | ChatMistral |
mistral-large-latest | MISTRAL_API_KEY |
| Meta Llama | ChatLlama |
Llama-4-Maverick-17B-128E-Instruct-FP8 | LLAMA_API_KEY |
| DeepSeek | ChatDeepSeek |
deepseek-chat | DEEPSEEK_API_KEY |
| Alibaba Qwen | ChatQwen |
qwen3.5-plus | DASHSCOPE_API_KEY |
| Moonshot Kimi | ChatKimi |
kimi-k2.5 | MOONSHOT_API_KEY |
| Cohere | ChatCohere |
command-a-03-2025 | CO_API_KEY |
| SarvamAI | ChatSarvamAI |
sarvam-105b | SARVAM_API_KEY |
| Ollama (Local) | ChatOllama |
gemma3 | No API key needed |
from plugllm import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4",
api_key="your-key",
temperature=0.7,
max_tokens=1000,
top_p=0.9
)# .env file
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
GROQ_API_KEY=gsk_...
ANTHROPIC_API_KEY=sk-ant-...from plugllm import ChatGroq
import os
# Automatically reads from environment
llm = ChatGroq(model="openai/gpt-oss-20b")from plugllm import LLMFactory
llm = LLMFactory.create(
provider="claude",
model="claude-sonnet-4-6",
api_key="your-key",
temperature=0.5
)from plugllm import ChatOpenAI, Message
class ChatBot:
def __init__(self, api_key):
self.llm = ChatOpenAI(api_key=api_key, model="gpt-5.4", max_history=20)
self.session_id = "chatbot_session"
self.llm.set_system_message(
"You are a friendly AI assistant. Be helpful and concise.",
session_id=self.session_id
)
def chat(self, user_message):
response = self.llm.chat(user_message, session_id=self.session_id)
return response.content
def get_history(self):
return self.llm.get_conversation_history(self.session_id)
# Use the chatbot
bot = ChatBot("your-key")
print(bot.chat("Hello!"))
print(bot.chat("What's my name? I'm John"))
print(bot.chat("What's my name?")) # Remembers "John"from plugllm import ChatOpenAI, ChatGemini, ChatGroq
providers = {
"OpenAI": ChatOpenAI(api_key="key1", model="gpt-5.4"),
"Gemini": ChatGemini(api_key="key2", model="gemini-3-flash-preview"),
"Groq": ChatGroq(api_key="key3", model="llama-3.3-70b-versatile")
}
prompt = "Explain quantum computing in one paragraph"
for name, llm in providers.items():
response = llm.ask(prompt, max_tokens=150)
print(f"\n{name}:\n{response.content[:200]}...")from plugllm import ChatMistral
llm = ChatMistral(api_key="your-key", model="mistral-large-latest")
def summarize_streaming(text):
prompt = f"Summarize this text in 3 bullet points:\n\n{text}"
print("Summary:", end=" ")
for chunk in llm.ask_stream(prompt, temperature=0.3):
print(chunk, end="", flush=True)
print()
long_text = "Your long article text here..."
summarize_streaming(long_text)from plugllm import ChatOpenAI
llm = ChatOpenAI(
api_key="your-key",
model="gpt-5.4",
timeout=120, # 2 minutes timeout
max_retries=3 # Retry failed requests
)from plugllm import ChatGroq
# Limit conversation history to prevent token overflow
llm = ChatGroq(
api_key="your-key",
model="llama-3.3-70b-versatile",
max_history=5 # Keep only last 5 messages
)from plugllm import ChatOpenAI
from plugllm.types import AuthenticationError, RateLimitError
from httpx import HTTPStatusError
llm = ChatOpenAI(api_key="your-key", model="gpt-5.4")
try:
response = llm.ask("Hello")
print(response)
except AuthenticationError:
print("Invalid API key. Please check your credentials.")
except RateLimitError:
print("Rate limit exceeded. Please wait and try again.")
except HTTPStatusError as e:
print(f"HTTP Error: {e.response.status_code}")
print(f"Details: {e.response.text}")
except Exception as e:
print(f"Unexpected error: {e}")- Reuse LLM instances instead of creating new ones for each request
- Use appropriate models for your use case (smaller models for simple tasks)
- Limit conversation history with
max_historyparameter - Use streaming for long responses to improve perceived performance
- Implement caching for frequently asked questions
# Good: Reuse instance
llm = ChatOpenAI(api_key="key", model="gpt-5.4")
for prompt in prompts:
response = llm.ask(prompt)
# Bad: Create new instance each time
for prompt in prompts:
llm = ChatOpenAI(api_key="key", model="gpt-5.4")
response = llm.ask(prompt)Run the test suite:
# Run all tests
pytest tests/
# Run specific provider tests
pytest tests/test_gemini_groq.py -v
# Run with coverage
pytest --cov=plugllm tests/Abstract base class for all providers.
Unified response object with properties:
content: The generated textmodel: Model usedusage: Token usage statisticsraw_response: Original API response
Message structure for conversations:
Message.user(content): Create user messageMessage.assistant(content): Create assistant messageMessage.system(content): Create system message
| Method | Description | Async Version |
|---|---|---|
generate() |
Basic text generation | agenerate() |
stream() |
Stream responses | astream() |
chat() |
Context-aware conversation | achat() |
ask() |
Simple Q&A with optional system prompt | aask() |
ask_stream() |
Streaming Q&A | aask_stream() |
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Add your changes
- Run tests (
pytest tests/) - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2026 Yash Kumar Firoziya
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions...
Yash Kumar Firoziya
- GitHub: @firoziya
- Email: ykfiroziya@gmail.com
- All LLM providers for their amazing APIs
- Open source community for inspiration
- Contributors and users for their support
If you find PlugLLM useful, please give it a star on GitHub! It helps others discover the project.
Built with β€οΈ for the Python AI community