Releases: mostlydev/cllama
v0.2.5
Highlights
- enforce declared per-agent model policy in
cllama - normalize runner model requests against the compiled allowlist
- restrict provider failover to the pod-declared fallback chain
- add xAI routing/policy fixes needed for direct
xai/...model refs
Artifacts
- container image:
ghcr.io/mostlydev/cllama:v0.2.5 - rolling tag:
ghcr.io/mostlydev/cllama:latest
Validation
go test ./...
v0.2.3
Changes
- Unpriced request tracking: requests where the upstream provider returns no cost data are now counted separately as
unpriced_requestsin the cost API response and surfaced in the dashboard UI - Reported cost passthrough:
CostInfo.CostUSDis now*float64(nil = unpriced, not zero); provider-reported cost fields are propagated through the proxy - Timezone context:
time_context.goinjects timezone-aware current time for agents that declare aTZenvironment variable - Dashboard:
total_requestsandunpriced_requestsexposed in the costs API endpoint
v0.2.2 — provider token pool + runtime provider add
What's new
- Provider token pool: Multi-key pool per provider with states ready/cooldown/dead/disabled. Proxy retries across keys on 401/429/5xx with failure classification and Retry-After support.
- Runtime provider add:
POST /providers/addUI route — add a new provider (name, base URL, auth type, API key) at runtime with no restart. Persists to.claw-auth/providers.jsonwithsource: runtime. ProviderState.Source: New field (seed/runtime) survives JSON round-trips.- UI bearer auth: All routes gated by
CLLAMA_UI_TOKENwhen configured. - Key management routes:
POST /keys/addandPOST /keys/delete. - Webhook alerts:
CLLAMA_ALERT_WEBHOOKSandCLLAMA_ALERT_MENTIONSfor pool events.
v0.2.1 — Feed Auth
What's new
- Feed authentication:
FeedEntrynow supports anauthfield. When present, the feed fetcher sets anAuthorization: Bearerheader on the fetch request. This enables authenticated feeds from services likeclaw-apithat require bearer token auth.
Backward compatible — existing feeds.json without auth fields work unchanged.
v0.2.0 — Feed Injection (ADR-013 Milestone 2)
What's Changed
Features
-
Feed injection — The proxy now supports runtime feed injection into LLM requests. Feeds defined in agent context manifests are fetched with TTL-based caching and injected as system message content before forwarding to the upstream provider. Both OpenAI (
messages[]) and Anthropic (top-levelsystem) formats are supported.internal/feeds/manifest.go— feed manifest parsing from agent contextinternal/feeds/fetcher.go— HTTP fetcher with TTL-based cachinginternal/feeds/inject.go— system message injection for OpenAI and Anthropic formats
-
Agent context extensions —
AgentContextnow exposesContextDirfor feed manifest discovery and service auth loading. -
Proxy handler — New
WithFeedsoption wires feed injection into the proxy pipeline, gated by pod name. -
Cost logging improvements — Better tracking in the logging layer.
Docker Image
ghcr.io/mostlydev/cllama:v0.3.4— multi-arch (linux/amd64 + linux/arm64)ghcr.io/mostlydev/cllama:latest
Test Coverage
- Feed fetcher, injection, and manifest parsing
- Proxy handler tests for both OpenAI and Anthropic feed injection paths
- Agent context service auth loading
v0.1.0
cllama v0.1.0 — First Release
OpenAI-compatible governance proxy for AI agent pods. Zero external dependencies, ~15 MB distroless image.
Features
- OpenAI-compatible proxy on
:8080—POST /v1/chat/completionswith streaming - Anthropic Messages bridge —
POST /v1/messageswith native format translation - Multi-provider registry — OpenAI, Anthropic, OpenRouter, Ollama with automatic routing
- Per-agent bearer token auth — agents never see real provider API keys
- Real-time operator dashboard on
:8081— SSE-powered live view of all LLM calls with agent ID, model, tokens, cost, latency - Cost tracking — per-agent, per-model, per-provider usage extraction from upstream responses
- Vendor-prefixed model fallback — routes
anthropic/claude-*etc. through OpenRouter when direct provider key is unavailable
Container Image
docker pull ghcr.io/mostlydev/cllama:latestPublished publicly at ghcr.io/mostlydev/cllama:latest.