Skip to content

Releases: mostlydev/cllama

v0.2.5

28 Mar 00:20
2ff644e

Choose a tag to compare

Highlights

  • enforce declared per-agent model policy in cllama
  • normalize runner model requests against the compiled allowlist
  • restrict provider failover to the pod-declared fallback chain
  • add xAI routing/policy fixes needed for direct xai/... model refs

Artifacts

  • container image: ghcr.io/mostlydev/cllama:v0.2.5
  • rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

  • go test ./...

v0.2.3

26 Mar 02:33

Choose a tag to compare

Changes

  • Unpriced request tracking: requests where the upstream provider returns no cost data are now counted separately as unpriced_requests in the cost API response and surfaced in the dashboard UI
  • Reported cost passthrough: CostInfo.CostUSD is now *float64 (nil = unpriced, not zero); provider-reported cost fields are propagated through the proxy
  • Timezone context: time_context.go injects timezone-aware current time for agents that declare a TZ environment variable
  • Dashboard: total_requests and unpriced_requests exposed in the costs API endpoint

v0.2.2 — provider token pool + runtime provider add

25 Mar 02:15
b20e7e1

Choose a tag to compare

What's new

  • Provider token pool: Multi-key pool per provider with states ready/cooldown/dead/disabled. Proxy retries across keys on 401/429/5xx with failure classification and Retry-After support.
  • Runtime provider add: POST /providers/add UI route — add a new provider (name, base URL, auth type, API key) at runtime with no restart. Persists to .claw-auth/providers.json with source: runtime.
  • ProviderState.Source: New field (seed/runtime) survives JSON round-trips.
  • UI bearer auth: All routes gated by CLLAMA_UI_TOKEN when configured.
  • Key management routes: POST /keys/add and POST /keys/delete.
  • Webhook alerts: CLLAMA_ALERT_WEBHOOKS and CLLAMA_ALERT_MENTIONS for pool events.

v0.2.1 — Feed Auth

22 Mar 16:39

Choose a tag to compare

What's new

  • Feed authentication: FeedEntry now supports an auth field. When present, the feed fetcher sets an Authorization: Bearer header on the fetch request. This enables authenticated feeds from services like claw-api that require bearer token auth.

Backward compatible — existing feeds.json without auth fields work unchanged.

v0.2.0 — Feed Injection (ADR-013 Milestone 2)

22 Mar 15:00

Choose a tag to compare

What's Changed

Features

  • Feed injection — The proxy now supports runtime feed injection into LLM requests. Feeds defined in agent context manifests are fetched with TTL-based caching and injected as system message content before forwarding to the upstream provider. Both OpenAI (messages[]) and Anthropic (top-level system) formats are supported.

    • internal/feeds/manifest.go — feed manifest parsing from agent context
    • internal/feeds/fetcher.go — HTTP fetcher with TTL-based caching
    • internal/feeds/inject.go — system message injection for OpenAI and Anthropic formats
  • Agent context extensionsAgentContext now exposes ContextDir for feed manifest discovery and service auth loading.

  • Proxy handler — New WithFeeds option wires feed injection into the proxy pipeline, gated by pod name.

  • Cost logging improvements — Better tracking in the logging layer.

Docker Image

  • ghcr.io/mostlydev/cllama:v0.3.4 — multi-arch (linux/amd64 + linux/arm64)
  • ghcr.io/mostlydev/cllama:latest

Test Coverage

  • Feed fetcher, injection, and manifest parsing
  • Proxy handler tests for both OpenAI and Anthropic feed injection paths
  • Agent context service auth loading

v0.1.0

19 Mar 16:22
395a3a1

Choose a tag to compare

cllama v0.1.0 — First Release

OpenAI-compatible governance proxy for AI agent pods. Zero external dependencies, ~15 MB distroless image.

Features

  • OpenAI-compatible proxy on :8080POST /v1/chat/completions with streaming
  • Anthropic Messages bridgePOST /v1/messages with native format translation
  • Multi-provider registry — OpenAI, Anthropic, OpenRouter, Ollama with automatic routing
  • Per-agent bearer token auth — agents never see real provider API keys
  • Real-time operator dashboard on :8081 — SSE-powered live view of all LLM calls with agent ID, model, tokens, cost, latency
  • Cost tracking — per-agent, per-model, per-provider usage extraction from upstream responses
  • Vendor-prefixed model fallback — routes anthropic/claude-* etc. through OpenRouter when direct provider key is unavailable

Container Image

docker pull ghcr.io/mostlydev/cllama:latest

Published publicly at ghcr.io/mostlydev/cllama:latest.