LangWatch is an open-source LLM observability, evaluation, and AI agent testing platform. OpenTelemetry-native tracing across LangChain, LangGraph, DSPy, OpenAI Agents, LiteLLM, Pydantic AI, CrewAI, AWS Bedrock, and more — paired with real-time and batch evaluators, prompt versioning, multi-turn agent simulations (open-source scenario framework with User Simulator and Judge Agents), datasets, an OpenAI/Anthropic-compatible AI Gateway with virtual keys and budgets, and a published REST API. Apache-2.0 core (with ee/ enterprise modules under commercial license); MIT-licensed SDKs; deployable as LangWatch Cloud or self-hosted via Docker Compose, Helm, Kind, or full Kubernetes.
URL: Visit APIs.json
Run: Capabilities Using Naftiko
- AI, Artificial Intelligence, LLM, LLM Observability, LLM Evaluation, Agent Testing, Agent Simulation, Prompt Management, Datasets, Tracing, OpenTelemetry, AI Gateway, DSPy, LangChain, Open Source, MCP, FinOps
- Created: 2026-05-25
- Modified: 2026-05-25
| Plan | Currency | Price | Included Events / Month | Retention | Notes |
|---|---|---|---|---|---|
| Developer | EUR / USD | Free | 50,000 | 14 days | 2 users, 3 scenarios, community support |
| Growth | EUR | 59 / core seat / month | 200,000 (+ EUR 0.0005 / event) | 30 days (+ EUR 3 / GB) | Unlimited lite-users, private support |
| Enterprise / Regulated | USD | Custom | Contract | Custom | On-prem, hybrid, SSO, SOC 2 / ISO 27001 |
| Self-Hosted OSS | USD | Free (Apache-2.0) | Unlimited | Unlimited | Docker Compose / Helm / Kind |
See plans/langwatch-plans-pricing.yml, rate-limits/langwatch-rate-limits.yml, and finops/langwatch-finops.yml.
The full LangWatch REST surface lives in a single OpenAPI 3.1 document maintained inside the langwatch/langwatch monorepo. A mirrored copy is stored here at openapi/langwatch-openapi.json and is referenced by every API entry below.
Search, retrieve, and share LLM application traces ingested via OpenTelemetry.
Human URL: https://langwatch.ai/docs/api-reference/traces
Configure and manage scorer evaluators — RAGAS, safety, PII, semantic similarity, LLM-as-Judge variants.
Online monitors that automatically score incoming production traces.
Manage evaluation, regression, and fine-tuning datasets and their records.
Version, tag, sync, and restore prompts across projects with feature-flag-style deployment.
Define multi-turn agent test scenarios used by the open-source scenario framework.
Query and retrieve completed agent simulation runs and batches.
Compose and execute batch test suites combining scenarios, datasets, and evaluators.
Trigger and inspect batch experiment runs (including DSPy-driven optimization runs).
Collaborative annotation and labeling workflows over traces.
Time-series analytics over traces, tokens, cost, latency, and evaluator scores.
Create, reorder, and manage dashboards and their composed graphs.
Provision and manage projects (workspaces) — the top-level isolation boundary.
Create, list, and revoke project API keys used by SDKs and automation.
Encrypted credential storage for evaluator and integration secrets.
Configure model-provider credentials and per-project model defaults.
OpenAI/Anthropic-compatible governance proxy — virtual keys, provider bindings, budgets, semantic cache rules.
Compose, version, and run optimization-studio workflows.
Define and update agent records used by simulations and scenarios.
Event-driven triggers that fire on trace conditions and monitor scores.
- Python SDK —
pip install langwatch(instruments OpenAI, Azure, LiteLLM, DSPy, LangChain, plus any OTel client) - TypeScript SDK —
npm i langwatch - MCP Server —
@langwatch/mcp-serverexposes Observability, Prompts, Datasets, Scenarios, and Evaluator tools to Claude / Cursor / other MCP clients - scenario — github.com/langwatch/scenario — open-source multi-turn agent testing with User Simulator and Judge Agents
- better-agents — github.com/langwatch/better-agents — standards for building agents
- langevals — github.com/langwatch/langevals — evaluator aggregation
- cookbooks — github.com/langwatch/cookbooks — Jupyter example notebooks
- Docker Compose, Helm chart, Kind, or full Kubernetes
- Data layer: PostgreSQL + Redis + ClickHouse + OpenSearch
- Apache-2.0 core;
ee/modules require commercial license
See apis.yml for the full common block — Portal, Documentation, API Reference, Self-Hosting docs, Pricing, OpenAPI, Application (Cloud Dashboard), SignUp, ChangeLog, Discord, LinkedIn, Twitter, YouTube, and the consolidated Features list.
- Kin Lane — kin@apievangelist.com — @apievangelist