██╗ ██╗███████╗███████╗████████╗ ████████╗███████╗██████╗ ███╗ ███╗
██║ ██║██╔════╝██╔════╝╚══██╔══╝ ╚══██╔══╝██╔════╝██╔══██╗████╗ ████║
███████║█████╗ █████╗ ██║ ██║ █████╗ ██████╔╝██╔████╔██║
██╔══██║██╔══╝ ██╔══╝ ██║ ██║ ██╔══╝ ██╔══██╗██║╚██╔╝██║
██║ ██║███████╗███████╗ ██║ ██║ ███████╗██║ ██║██║ ╚═╝ ██║
╚═╝ ╚═╝╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝
Self-hosted AI gateway platform. Chat with your 122B model, call tools, manage sessions, and build persistent memory — entirely on your own hardware.
2026-04-19 — Session Manager Fixes:
- Session manager now always gets a fresh SQLite session per operation, eliminating "prepared state" errors from transaction nesting
- WebSocket chat now auto-creates sessions if they don't exist instead of returning an error
HinksBot Gateway is a self-hosted AI platform that turns a raw llama-server endpoint into a full-featured conversational workspace — streaming responses, tool calling, conversation memory, session management, and a polished web UI. Everything runs locally on your hardware.
┌──────────────────────────────────────────────────────────────┐
│ Frontend (React + Vite + Tailwind, port 5173) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ ChatView │ │ Sidebar │ │ SettingsPanel │ │
│ │ (streaming) │ │ (sessions) │ │ (model/config) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
└─────────┼─────────────────┼────────────────────┼────────────┘
│ WebSocket + REST API │
▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ Backend (FastAPI + Uvicorn, port 8000) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ /ws/chat/* │ │ /sessions/* │ │ /tools/* │ │
│ │ Agent loop │ │ CRUD ops │ │ Registry │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
│ ┌──────▼─────────────────▼─────────────────────▼─────────┐ │
│ │ SessionManager │ ToolRegistry │ LlamaServerClient │ │
│ │ (SQLite) │ (hot-reload) │ (HTTP to :8080) │ │
│ └────────────────────────┬───────────────────────────────┘ │
└───────────────────────────┼──────────────────────────────────┘
│ HTTP (port 8080)
▼
┌──────────────────────┐
│ llama-server │
│ (Qwen 122B Q4) │
│ Context: 262K │
└──────────────────────┘
- Streaming Chat — Real-time token-by-token responses via WebSocket
- Tool Calling — Model-driven tool execution with server-side execution and result injection
- Session Management — Create, resume, search, rename, and delete conversation sessions
- Persistent Memory — SQLite-backed message history; full conversation context on reconnect
- Dynamic Tool Registry — Hot-reloadable tools registered via
@register_tooldecorator - Session Search — Full-text search across all sessions and messages
- Session Export — Export conversations as JSON or Markdown
- Usage Stats — Track messages, tool calls, and session activity over time
- Dark Theme UI — Terminal-aesthetic web interface (Linear meets WezTerm)
- Configurable — All settings via
config.yaml, no hardcoded values - Context Management — Automatic truncation to fit within context window
- Python 3.11+
- Node.js 18+ and npm
- A running llama-server instance at
http://localhost:8080 - ~122B Q4_K_XL model or compatible GGUF model file
cd /home/alexanderh/projects/hinksbot-gateway
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtcd /home/alexanderh/projects/hinksbot-gateway/frontend
npm installEdit config.yaml in the project root:
server:
host: "0.0.0.0"
port: 8000
debug: false
llama_server:
base_url: "http://localhost:8080"
api_key: "" # empty for local llama-server
database:
path: "~/.hermes/hinksbot-gateway.db"
defaults:
model: "Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf"
system_prompt: "You are HinksBot, a helpful AI assistant..."
max_turns: 20
context_window: 262144
tool_groups:
- file_tools
- wiki_tools
- system_tools
frontend:
port: 5173
title: "HinksBot Gateway"source .venv/bin/activate
cd /home/alexanderh/projects/hinksbot-gateway
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reloadBackend starts at http://localhost:8000. API docs at http://localhost:8000/docs.
cd /home/alexanderh/projects/hinksbot-gateway/frontend
npm run devFrontend available at http://localhost:5173.
All configuration lives in config.yaml at the project root.
| Key | Type | Default | Description |
|---|---|---|---|
server.host |
string | "0.0.0.0" |
Backend bind address |
server.port |
int | 8000 |
Backend port |
server.debug |
bool | false |
Enable debug mode (auto-reload) |
llama_server.base_url |
string | "http://localhost:8080" |
llama-server HTTP endpoint |
llama_server.api_key |
string | "" |
Auth token (empty for local) |
database.path |
string | "~/.hermes/hinksbot-gateway.db" |
SQLite database path |
defaults.model |
string | (GGUF filename) | Model to use |
defaults.system_prompt |
string | "...HinksBot..." | System prompt injected into every session |
defaults.max_turns |
int | 20 |
Max agent loop iterations per message |
defaults.context_window |
int | 262144 |
Context window size (tokens) |
defaults.tool_groups |
list[str] | [] |
Default enabled tool groups |
frontend.port |
int | 5173 |
Vite dev server port |
frontend.title |
string | "HinksBot Gateway" |
Browser tab title |
hinksbot-gateway/
├── config.yaml # All configuration (no hardcoding)
├── requirements.txt # Python dependencies
├── backend/
│ ├── main.py # FastAPI app + lifespan + CORS
│ ├── config.py # YAML config loader (Pydantic models)
│ └── api/
│ ├── sessions.py # Session CRUD + export
│ ├── chat.py # WebSocket handler + ConnectionManager
│ ├── tools.py # Tool registry REST API
│ ├── models.py # Model switching endpoints
│ ├── uploads.py # File upload handling
│ └── stats.py # Usage statistics
│ └── core/
│ ├── session_manager.py # Agent loop (tool calls, history)
│ └── model_client.py # llama-server HTTP client
│ └── db/
│ ├── database.py # SQLAlchemy async engine + session
│ ├── models.py # ORM: Session, Message, ToolExecution
│ └── init_db.py # Schema creation
│ └── tools/
│ ├── registry.py # ToolRegistry + @register_tool decorator
│ ├── file_tools.py # file_read, file_write, search_files
│ ├── terminal_tools.py # run_shell
│ ├── web_tools.py # web_search, web_extract
│ ├── wiki_tools.py # wiki_search, session_history_search
│ └── system_tools.py # sys_metrics, list_processes
└── frontend/
├── package.json
├── vite.config.ts
├── tailwind.config.js
└── src/
├── App.tsx
├── types/index.ts
├── hooks/
│ ├── useChat.ts
│ ├── useSessions.ts
│ └── useTools.ts
└── components/
├── ChatView.tsx
├── ChatInput.tsx
├── MessageBubble.tsx
├── ToolCallCard.tsx
├── SessionList.tsx
├── Sidebar.tsx
├── SettingsPanel.tsx
└── StatsDashboard.tsx
Base URL: http://localhost:8000/api/v1
GET /api/v1/health
Response: { "status": "ok", "model": "Qwen3.5-122B..." }
| Method | Path | Description |
|---|---|---|
GET |
/sessions |
List all sessions (paginated: ?page=1&limit=50) |
POST |
/sessions |
Create new session ({title?, model?}) |
GET |
/sessions/{id} |
Get session metadata |
PATCH |
/sessions/{id} |
Update session ({title?, model?, tool_options?}) |
DELETE |
/sessions/{id} |
Delete session |
GET |
/sessions/{id}/messages |
Get messages for session (?page=1&limit=50) |
GET |
/sessions/{id}/export |
Export session (?format=json|md) |
GET |
/sessions/search?q=query |
Full-text session/message search |
| Method | Path | Description |
|---|---|---|
GET |
/tools |
List all registered tools (?group=file_tools) |
GET |
/tools/groups |
List all tool groups |
GET |
/tools/{name} |
Get specific tool details |
GET |
/tools/{name}/schema |
Get OpenAI-compatible tool schema |
POST |
/tools/{name}/execute |
Execute tool directly ({arguments}) |
| Method | Path | Description |
|---|---|---|
GET |
/stats |
Usage statistics (sessions, messages, tool calls) |
Endpoint: /ws/chat/{session_id}
Connect, then send JSON messages. All messages are JSON objects.
Send a message:
{
"type": "user_message",
"content": "What files changed in the last commit?",
"tool_options": ["file_tools", "git_tools"]
}Send tool result (after receiving a tool_call):
{
"type": "tool_result",
"tool_call_id": "call_abc123",
"result": "23 files found...",
"error": null
}Stream start:
{
"type": "stream_start",
"message_id": "msg_xyz789"
}Token delta:
{
"type": "content_block_delta",
"delta": "Looking",
"block_type": "text"
}Tool call:
{
"type": "tool_call",
"tool": "search_files",
"tool_call_id": "call_abc123",
"args": {"pattern": "*.py", "path": "/home/alexanderh"}
}Tool result:
{
"type": "tool_result",
"tool_call_id": "call_abc123",
"result": "23 files found...",
"success": true
}Stream end:
{
"type": "stream_end",
"message_id": "msg_xyz789",
"usage": {"turns": 3}
}Error:
{
"type": "error",
"error": "model_unavailable",
"message": "LLM endpoint not responding"
}All tools are registered via the @register_tool decorator. Each tool has a name, description, JSON Schema for arguments, handler function, and group memberships.
| Tool | Groups | Description |
|---|---|---|
file_read |
file, read |
Read file contents with line numbers and pagination |
file_write |
file, write |
Write content to a file (overwrites) |
search_files |
search, file |
Regex search inside files, or glob by filename |
run_shell |
terminal, shell |
Execute a shell command in a specified working directory |
web_search |
web, search |
Search the web, returns titles/URLs/descriptions |
web_extract |
web, extract |
Extract full content from web pages as markdown |
wiki_search |
wiki, memory, search |
Search the mempalace knowledge graph |
session_history_search |
search, history, memory |
Full-text search over conversation history |
sys_metrics |
system, metrics, monitoring |
CPU, RAM, and disk usage |
list_processes |
system, processes, monitoring |
List running processes (like ps aux) |
Tools are organized into groups for session-level toggling:
file/read/write/search— File operationsterminal/shell— Shell executionweb/search/extract— Web accesswiki/memory— Knowledge graph / memorysystem/metrics/processes/monitoring— System info
from backend.tools.registry import register_tool
@register_tool(
name="my_tool",
description="Does something useful",
parameters={
"type": "object",
"properties": {
"arg1": {"type": "string", "description": "An argument"},
},
"required": ["arg1"],
},
groups=["my_group"],
)
async def my_tool(arg1: str) -> str:
return f"did {arg1}"The tool is automatically registered on server startup when backend/tools/__init__.py imports all tool modules.
# Backend with auto-reload
source .venv/bin/activate
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend with HMR
cd frontend && npm run devThe SQLite database is at ~/.hermes/hinksbot-gateway.db. It is created automatically on first startup.
Schema tables:
sessions— Session metadata (id, title, model, created_at, updated_at, tool_options)messages— All messages (id, session_id, role, content, tool_call_id, tool_name, token_count)tool_executions— Tool call records (id, message_id, tool_name, args, result, duration_ms, success, error)
Tools are hot-reloadable in dev mode (restart backend to pick up new tools). Add a new tool by creating a function decorated with @register_tool in the appropriate backend/tools/*.py file.
Backend: Add to requirements.txt then pip install -r requirements.txt
Frontend: Add to frontend/package.json then cd frontend && npm install
| Variable | Description |
|---|---|
HINKSBOT_GATEWAY_CONFIG |
Path to alternate config.yaml |
MIT License