Releases · mostlydev/cllama

28 Mar 00:20

mostlydev

v0.2.5

2ff644e

v0.2.5 Latest

Latest

Highlights

enforce declared per-agent model policy in cllama
normalize runner model requests against the compiled allowlist
restrict provider failover to the pod-declared fallback chain
add xAI routing/policy fixes needed for direct xai/... model refs

Artifacts

container image: ghcr.io/mostlydev/cllama:v0.2.5
rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

go test ./...

Assets 2

26 Mar 02:33

mostlydev

v0.2.3

4090ccf

v0.2.3

Changes

Unpriced request tracking: requests where the upstream provider returns no cost data are now counted separately as unpriced_requests in the cost API response and surfaced in the dashboard UI
Reported cost passthrough: CostInfo.CostUSD is now *float64 (nil = unpriced, not zero); provider-reported cost fields are propagated through the proxy
Timezone context: time_context.go injects timezone-aware current time for agents that declare a TZ environment variable
Dashboard: total_requests and unpriced_requests exposed in the costs API endpoint

Assets 2

25 Mar 02:15

mostlydev

v0.2.2

b20e7e1

v0.2.2 — provider token pool + runtime provider add

What's new

Provider token pool: Multi-key pool per provider with states ready/cooldown/dead/disabled. Proxy retries across keys on 401/429/5xx with failure classification and Retry-After support.
Runtime provider add: POST /providers/add UI route — add a new provider (name, base URL, auth type, API key) at runtime with no restart. Persists to .claw-auth/providers.json with source: runtime.
ProviderState.Source: New field (seed/runtime) survives JSON round-trips.
UI bearer auth: All routes gated by CLLAMA_UI_TOKEN when configured.
Key management routes: POST /keys/add and POST /keys/delete.
Webhook alerts: CLLAMA_ALERT_WEBHOOKS and CLLAMA_ALERT_MENTIONS for pool events.

Assets 2

22 Mar 16:39

mostlydev

v0.2.1

a5f68b4

v0.2.1 — Feed Auth

What's new

Feed authentication: FeedEntry now supports an auth field. When present, the feed fetcher sets an Authorization: Bearer header on the fetch request. This enables authenticated feeds from services like claw-api that require bearer token auth.

Backward compatible — existing feeds.json without auth fields work unchanged.

Assets 2

22 Mar 15:00

mostlydev

v0.2.0

94ae8a3

v0.2.0 — Feed Injection (ADR-013 Milestone 2)

What's Changed

Features

Feed injection — The proxy now supports runtime feed injection into LLM requests. Feeds defined in agent context manifests are fetched with TTL-based caching and injected as system message content before forwarding to the upstream provider. Both OpenAI (messages[]) and Anthropic (top-level system) formats are supported.
- internal/feeds/manifest.go — feed manifest parsing from agent context
- internal/feeds/fetcher.go — HTTP fetcher with TTL-based caching
- internal/feeds/inject.go — system message injection for OpenAI and Anthropic formats
Agent context extensions — AgentContext now exposes ContextDir for feed manifest discovery and service auth loading.
Proxy handler — New WithFeeds option wires feed injection into the proxy pipeline, gated by pod name.
Cost logging improvements — Better tracking in the logging layer.

Docker Image

ghcr.io/mostlydev/cllama:v0.3.4 — multi-arch (linux/amd64 + linux/arm64)
ghcr.io/mostlydev/cllama:latest

Test Coverage

Feed fetcher, injection, and manifest parsing
Proxy handler tests for both OpenAI and Anthropic feed injection paths
Agent context service auth loading

Assets 2

19 Mar 16:22

mostlydev

v0.1.0

395a3a1

v0.1.0

cllama v0.1.0 — First Release

OpenAI-compatible governance proxy for AI agent pods. Zero external dependencies, ~15 MB distroless image.

Features

OpenAI-compatible proxy on :8080 — POST /v1/chat/completions with streaming
Anthropic Messages bridge — POST /v1/messages with native format translation
Multi-provider registry — OpenAI, Anthropic, OpenRouter, Ollama with automatic routing
Per-agent bearer token auth — agents never see real provider API keys
Real-time operator dashboard on :8081 — SSE-powered live view of all LLM calls with agent ID, model, tokens, cost, latency
Cost tracking — per-agent, per-model, per-provider usage extraction from upstream responses
Vendor-prefixed model fallback — routes anthropic/claude-* etc. through OpenRouter when direct provider key is unavailable

Container Image

docker pull ghcr.io/mostlydev/cllama:latest

Published publicly at ghcr.io/mostlydev/cllama:latest.

Assets 2

Releases: mostlydev/cllama

v0.2.5

Highlights

Artifacts

Validation

Uh oh!

v0.2.3

Changes

Uh oh!

v0.2.2 — provider token pool + runtime provider add

What's new

Uh oh!

v0.2.1 — Feed Auth

What's new

Uh oh!

v0.2.0 — Feed Injection (ADR-013 Milestone 2)

What's Changed

Features

Docker Image

Test Coverage

Uh oh!

v0.1.0

cllama v0.1.0 — First Release

Features

Container Image

Uh oh!