Skip to content

RubenGlez/langdrift

Repository files navigation

LangDrift

npm version node

Locale-aware evals for AI agent behavior.

LangDrift checks whether an AI agent preserves behavior across languages: tool selection, tool arguments, response language, and failure modes. It is built for teams who already test their agent in English and want to know what changes when the same intent arrives in French, Arabic, Chinese, Basque, Swahili, or any other locale.

The core question:

Does your agent still choose the right tool when the same user intent arrives in another language?

See It In 30 Seconds

No API key required. Clone the repo, start the fake agent, and run a scenario:

git clone https://github.com/RubenGlez/langdrift.git
cd langdrift
pnpm install
pnpm fake-agent

In another terminal:

node ./src/cli.ts run ./examples/scenarios/support-routing.yaml --target http://127.0.0.1:3011/api/agent

Testing your own agent? Install globally and point it at your endpoint:

npm install -g langdrift
langdrift run ./my-scenario.yaml --target http://localhost:3010/api/agent

The fake agent intentionally drops tool calls for Swahili (sw) and Chinese (zh), so the demo shows the core failure mode without using an API key:

Locale  Passed  Failure       Detail
en      1/1     -             create_refund_ticket
sw      0/1     no_tool_call  expected create_refund_ticket, got no tool calls
zh      0/1     no_tool_call  expected create_refund_ticket, got no tool calls

Result: failed, 2 of 12 locales failed

Why This Exists

AI localization is moving from translated strings to localized behavior. For an agent, a localized experience is only correct if the agent preserves intent, tool calls, structured output, and policy behavior across languages.

In the included benchmark, English often passed while equivalent prompts in Basque, Yoruba, Swahili, Chinese, Welsh, and Mongolian triggered missing tool calls, wrong tool arguments, or different tool-use behavior depending on the model. This is not a universal language ranking. It is a reproducible demonstration that agent behavior can drift across locales.

Read the full methodology, results, limitations, and supporting papers in RESEARCH.md.

What It Does

  • Runs YAML scenarios with per-locale user inputs.
  • Sends each locale to any HTTP agent target.
  • Checks tool calls, shallow arguments, forbidden tools, ordered tool-call sequences, and response language.
  • Reports pass/fail by locale with failure mode classification.
  • Emits terminal, JSON, or markdown reports.
  • Exits non-zero on failure, so it works in CI.
  • Supports repeated runs with --iterations N.
  • Supports directory-level scenario runs for locale x scenario matrices.
  • Provides lint and LLM-assisted translate commands for scenario maintenance.

Failure modes include no_tool_call, wrong_tool, wrong_argument, missing_argument, forbidden_tool, wrong_sequence, and wrong_language.

Install

npm install -g langdrift

Requires Node >= 24. LangDrift runs TypeScript directly via Node's native type stripping, so there is no build step. Node 22.6+ also works if you pass --experimental-strip-types when invoking the CLI directly, but the global install expects Node 24.

Quick Start

Create a starter scenario:

langdrift init ./my-scenario.yaml --template support

Edit the generated YAML:

id: refund_request
agent: support

locales:
  en:
    input: "I was charged twice for my subscription. Can you refund one charge?"
    expect:
      toolCall:
        name: create_refund_ticket
        arguments:
          reason: duplicate_charge
      noToolCall:
        name: escalate_to_human

  fr:
    input: "J'ai été facturé deux fois. Pouvez-vous me rembourser un paiement?"
    expect:
      toolCall:
        name: create_refund_ticket
        arguments:
          reason: duplicate_charge
      responseLanguage: fr

Run it against your agent:

langdrift run ./my-scenario.yaml --target http://127.0.0.1:3010/api/agent

Emit JSON for tooling:

langdrift run ./my-scenario.yaml --target http://127.0.0.1:3010/api/agent --format json

Run a directory of scenarios:

langdrift run ./scenarios --target http://127.0.0.1:3010/api/agent --iterations 3 --format markdown

HTTP Target Contract

LangDrift makes a POST request to your agent for each locale.

Request:

{
  "locale": "fr",
  "input": "J'ai été facturé deux fois. Pouvez-vous me rembourser un paiement?",
  "scenarioId": "refund_request"
}

Response:

{
  "text": "Je peux vous aider avec ce remboursement.",
  "toolCalls": [
    {
      "name": "create_refund_ticket",
      "arguments": {
        "reason": "duplicate_charge"
      }
    }
  ],
  "structured": null
}

Response fields:

Field Type Description
text string Agent text reply. Missing defaults to "".
toolCalls array Tool calls made by the agent. Each item must have name; arguments is optional. Missing defaults to [].
structured any Optional structured output. Missing defaults to null.

Extra response fields are ignored. Non-2xx responses fail the locale.

See docs/integrations.md for OpenAI SDK, Vercel AI SDK, LangChain, Anthropic, and Fastify examples.

CLI

langdrift init [scenario.yaml] [--template support|ecommerce|scheduling|generic]
langdrift run <scenario.yaml|dir> --target <url> [--iterations N] [--format text|json|markdown] [--min-pass-rate N] [--allow-fail <locale>]
langdrift lint <scenario.yaml|dir>
langdrift translate <scenario.yaml> [--locales fr,ar,zh,...] [--write]

Useful CI flags:

  • --min-pass-rate N: fail only if the overall pass rate is below N.
  • --allow-fail <locale>: keep reporting a known weak locale without letting it fail the build.
  • --format markdown: write a table suitable for GitHub Actions summaries or PR comments.

See docs/ci.md for GitHub Actions examples.

Example Agents

The repo includes two local agents:

  • pnpm fake-agent: deterministic demo agent, no API key required.
  • pnpm agent: model-backed agent for OpenAI, Anthropic, DeepSeek, or any OpenAI-compatible API.
# OpenAI
OPENAI_API_KEY=... pnpm agent

# Anthropic
ANTHROPIC_API_KEY=... MODEL_PROVIDER=anthropic MODEL_NAME=claude-haiku-4-5-20251001 pnpm agent

# DeepSeek
DEEPSEEK_API_KEY=... MODEL_PROVIDER=deepseek MODEL_NAME=deepseek-chat pnpm agent

# Choose domain: support (default), ecommerce, scheduling
DOMAIN=ecommerce OPENAI_API_KEY=... pnpm agent

Then run a scenario:

langdrift run ./examples/scenarios/support-routing.yaml --target http://127.0.0.1:3010/api/agent

Design Choices

  • Behavior over text. LangDrift checks tool calls and structured behavior, not whether a reply sounds fluent.
  • Deterministic assertions first. No LLM-as-judge in the core loop; failures are explainable and CI-friendly.
  • HTTP contract over framework lock-in. Any agent that can accept one POST request can be tested.
  • Small, inspectable core. Zero runtime dependencies, TypeScript source, Node >= 24.
  • Demo without API keys. The fake agent makes the failure mode visible locally before connecting a real model.

More Context

About

Locale-aware eval harness for AI agents. Test whether your agent behaves correctly when the same intent is expressed across different languages.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors