- OpenAI-Compatible API — Drop-in replacement for OpenAI
/v1/chat/completionsand/v1/modelsendpoints - Nonebot2-Style Plugin System — Decorator-driven provider registration (
@provider.discover(),@provider.handle()) - Multi-Provider Routing — Automatic model discovery and routing across multiple backend providers
- Load Balancing — Built-in Round-Robin and Priority balancers, pluggable balancer interface
- Hook System — Pre/Post hooks for authentication, logging, and custom middleware
- SSE Streaming — Full support for streaming (SSE) and non-streaming chat completions
- Credential Management — Per-plugin credential directory with auto-refresh
┌─────────────────────────────────────────────────┐
│ FastAPI Server │
│ /v1/chat/completions /v1/models │
└────────────────────┬────────────────────────────┘
│
┌──────▼──────┐
│ Router │ Model → Provider mapping
└──────┬──────┘
│
┌─────────▼─────────┐
│ Balancer │ RoundRobin / Priority
└─────────┬─────────┘
│
┌──────▼──────┐
│ Registry │ Provider & model registry
└──────┬──────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Qwen │ │ Echo │ │ Custom │ ...
│ Plugin │ │ Plugin │ │ Plugin │
└─────────┘ └─────────┘ └─────────┘
# Clone the repository
git clone https://github.com/your-repo/Porta.git
cd Porta
# Install with uv (recommended)
uv sync
# Or install with pip
pip install -e .Create a porta.yaml in the project root:
server:
host: 0.0.0.0
port: 8000
auth:
enabled: false
api_keys: []
balancer:
type: round_robin # round_robin | priority
credentials_dir: ./credentials
plugin_dirs:
- ./plugins
plugins:
qwen:
base_url: https://chat.qwen.ai
timeout: 120
output_thinking: trueporta run
# With options
porta run --host 127.0.0.1 --port 8080 --config ./porta.yaml --log-level DEBUGOnce running, the server exposes OpenAI-compatible endpoints:
# List available models
curl http://localhost:8000/v1/models
# Chat completion (non-streaming)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-max",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Chat completion (streaming)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-max",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Works with any OpenAI-compatible client (e.g. Python openai library):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="any")
response = client.chat.completions.create(
model="qwen-max",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)A simple echo plugin for testing. Returns the user's message with a configurable prefix.
# porta.yaml
plugins:
echo:
prefix: "Echo: "Reverse-engineered provider for Qwen (通义千问) chat service.
Setup:
- Copy the example credential file:
cp credentials/qwen/accounts.json.example credentials/qwen/accounts.json- Edit
credentials/qwen/accounts.jsonwith your Qwen account:
{
"accounts": [
{
"email": "your-email@mail.com",
"password": "your-password",
"token": null,
"expires": null,
"proxy": null
}
]
}Supported model suffixes:
| Suffix | Description |
|---|---|
-thinking |
Enable deep thinking mode |
-search |
Enable web search |
-thinking-search |
Deep thinking + web search |
-image |
Text-to-image |
-image-edit |
Image editing |
-video |
Text-to-video |
For example, use qwen-max-thinking to access the thinking mode of qwen-max.
Configuration:
plugins:
qwen:
base_url: https://chat.qwen.ai
timeout: 120
output_thinking: true # Whether to output <think> tagsPorta uses a Nonebot2-style decorator system for plugin development. Create a directory under plugins/ with an __init__.py:
from porta import provider, on_startup, on_shutdown, get_plugin_config, get_plugin_credential, ModelInfo
from porta.models.openai import ChatCompletionChunk, ChatCompletionRequest, ChunkChoice, ChunkDelta
from pydantic import BaseModel
from typing import AsyncGenerator
# Plugin configuration
class MyConfig(BaseModel):
greeting: str = "Hello"
config = get_plugin_config(MyConfig)
cred_dir = get_plugin_credential()
# Register as a provider
my = provider("my-provider", priority=50)
@on_startup
async def startup():
print("Plugin starting up...")
@on_shutdown
async def shutdown():
print("Plugin shutting down...")
# Declare available models
@my.discover()
async def discover() -> list[ModelInfo]:
return [ModelInfo(id="my-model", owned_by="my-provider")]
# Handle chat completion requests
@my.handle()
async def handle(request: ChatCompletionRequest) -> AsyncGenerator[ChatCompletionChunk, None]:
# Your implementation here
yield ChatCompletionChunk(...)| API | Description |
|---|---|
provider(name, priority, block) |
Register a model provider |
on_startup |
Decorator for startup hook |
on_shutdown |
Decorator for shutdown hook |
get_plugin_config(model) |
Get plugin-specific config from porta.yaml |
get_plugin_credential() |
Get Path to plugin's credential directory |
ModelInfo |
Model metadata (id, owned_by, created, extra) |
| Decorator | Description |
|---|---|
@prov.discover() |
Register model discovery function |
@prov.handle() |
Register chat completion handler |
porta/
├── api/ # FastAPI route handlers
│ ├── chat.py # /v1/chat/completions
│ └── models.py # /v1/models
├── core/ # Core routing & balancing
│ ├── registry.py # Provider & model registry
│ ├── router.py # Model → Provider routing
│ └── balancer.py # Load balancing (RoundRobin / Priority)
├── hook/ # Pre/Post hook system
│ └── base.py # AuthHook, LogHook, HookManager
├── models/ # Data models
│ ├── internal.py # ModelInfo, RouteResult, PluginMeta
│ └── openai.py # OpenAI-compatible request/response models
├── plugin/ # Plugin system
│ ├── base.py # ModelProvider base class
│ ├── loader.py # Plugin discovery & loading
│ └── manager.py # Plugin lifecycle management
├── utils/
│ └── sse.py # SSE stream utilities
├── app.py # FastAPI app factory
├── config.py # Configuration (pydantic-settings)
└── cli.py # CLI entry point
plugins/
├── echo/ # Echo plugin (example)
│ └── __init__.py
└── qwen/ # Qwen provider plugin
└── __init__.py
credentials/
└── qwen/
├── accounts.json.example
└── accounts.json # (gitignored)
# Install dev dependencies
uv sync --group dev
# Run tests
pytest
# Run with debug logging
porta run --log-level DEBUGThis project is licensed under the GNU Affero General Public License v3.0.
Powered By CuzTeam.