diff --git a/app/_ai_integrations/langchain.md b/app/_ai_integrations/langchain.md new file mode 100644 index 0000000000..9fb0e70b88 --- /dev/null +++ b/app/_ai_integrations/langchain.md @@ -0,0 +1,366 @@ +--- +title: LangChain +description: Use LangChain with {{site.ai_gateway_name}} to centralize model routing, provider credentials, authentication, and AI traffic controls. +url: "/ai-integrations/langchain/" +content_type: ai_integration +layout: ai_integration +products: + - ai-gateway +tools: + - deck +canonical: true +works_on: + - konnect +min_version: + gateway: '3.14' +categories: + - libraries + - frameworks +featured: true + +overview: | + [LangChain](https://www.langchain.com/) is a framework for building applications on top of LLMs: + chains, retrieval, tools, and agents. Its OpenAI chat model (`ChatOpenAI`) can call any + OpenAI-compatible endpoint, so you can point it at a {{site.ai_gateway_name}} Route instead of + calling a provider directly. + + Your LangChain code keeps using `invoke`, `stream`, LCEL chains, `bind_tools`, and LangGraph agents, + while the gateway owns the parts you do not want in the client: provider credentials, model selection, + authentication, observability, guardrails, rate limiting, and semantic caching. You add or change + those controls at the gateway without touching application code. + + The examples on this page use the Python SDK. LangChain.js works the same way: set `configuration.baseURL` + on `ChatOpenAI` to your Route. +--- + +## Quick start + +Point LangChain's `ChatOpenAI` model at a {{site.ai_gateway_name}} Route running on Kong Konnect, then +use LangChain exactly as you normally would. + +### Prerequisites + +- Python 3.9+. +- A [Kong Konnect](https://konnect.konghq.com) account with a Gateway control plane and a running data + plane. New to AI Gateway? Start with [Get started with AI Gateway](/ai-gateway/get-started/). +- A Route on that control plane with the [AI Proxy](/plugins/ai-proxy/) or + [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin, plus an upstream provider key held by the + Plugin. If you do not have one yet, see [Set up the Kong AI Gateway Route](#set-up-the-kong-ai-gateway-route). +- A Kong Konnect Personal Access Token (`kpat_...`) to configure the gateway with decK. + +### Install + +```bash +pip install -U langchain-openai +``` + +### Configure the model + +Create a shared module that builds `ChatOpenAI` with `base_url` set to your {{site.ai_gateway_name}} +Route instead of the OpenAI API: + +```python +# kong_gateway.py +import os +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI( + # Your Kong Konnect AI Gateway proxy URL plus the Route path, not the OpenAI API. + base_url=f"{os.environ['KONNECT_AI_GATEWAY_URL']}/langchain", + model="gpt-4o", + # The upstream provider key lives in the gateway, so this value is not used. + api_key="kong", +) +``` + +Set `KONNECT_AI_GATEWAY_URL` to your Konnect Gateway's proxy URL, the data plane endpoint that serves +your Routes: + +```bash +export KONNECT_AI_GATEWAY_URL='https://your-gateway-host' +``` + +### Invoke the model + +```python +from kong_gateway import llm + +response = llm.invoke('Write a concise release note for a new AI Gateway model routing policy.') +print(response.content) +``` + +Kong receives the request, injects the real provider credential, selects the upstream model, and +returns an OpenAI-compatible response. Every other LangChain feature works the same way, because the +model is still speaking OpenAI's chat-completion protocol to Kong. + +## Stream responses + +Use `stream` to consume tokens as they are generated: + +```python +from kong_gateway import llm + +for chunk in llm.stream('Stream a short checklist for safely launching an AI feature.'): + print(chunk.content, end='', flush=True) +``` + +## Build a chain (LCEL) + +LangChain Expression Language pipes a prompt into the model and an output parser. Kong stays on the +request path for every call in the chain: + +```python +from langchain_core.prompts import ChatPromptTemplate +from langchain_core.output_parsers import StrOutputParser +from kong_gateway import llm + +prompt = ChatPromptTemplate.from_template( + 'Write a concise release note for: {feature}' +) +chain = prompt | llm | StrOutputParser() + +print(chain.invoke({'feature': 'a new AI Gateway model routing policy'})) +``` + +## Use tools + +Bind tools to the model with `bind_tools`. Tool calling works through Kong whenever the upstream model +supports it: + +```python +from langchain_core.tools import tool +from kong_gateway import llm + +@tool +def get_gateway_policy(route_name: str) -> dict: + """Return the policy status for an AI Gateway route.""" + return {'route_name': route_name, 'auth': 'enabled', 'semantic_cache': 'enabled'} + +llm_with_tools = llm.bind_tools([get_gateway_policy]) +response = llm_with_tools.invoke('Check the AI Gateway policy for the production chat route.') + +print(response.tool_calls) +``` + +## Build an agent + +A LangGraph agent runs the model, calls your tools, and loops until it has an answer. All model and +tool traffic flows through Kong: + +```bash +pip install -U langgraph +``` + +```python +from langgraph.prebuilt import create_react_agent +from kong_gateway import llm + +def get_route_metrics(route_name: str) -> str: + """Return request and error counts for an AI Gateway route.""" + return f'{route_name}: 14820 requests, 0.4% error rate' + +agent = create_react_agent(llm, [get_route_metrics]) +result = agent.invoke({ + 'messages': [{'role': 'user', 'content': 'Is the production chat route healthy?'}], +}) + +print(result['messages'][-1].content) +``` + +## Route to multiple models + +Instead of hard-coding provider model names in your app, configure client-facing model aliases with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/). The application sends an alias such as `fast` or +`smart`, and Kong maps it to a real upstream model. You can change the upstream model, swap providers, +or add load balancing at the gateway without redeploying the app. + +Add a target per alias in your Kong configuration: + +{%- raw %} +```yaml +plugins: +- name: ai-proxy-advanced + config: + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o-mini + model_alias: fast + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o + model_alias: smart +``` +{% endraw -%} + +Then select a model by alias with the same Route: + +```python +import os +from langchain_openai import ChatOpenAI + +base_url = f"{os.environ['KONNECT_AI_GATEWAY_URL']}/langchain" + +# Fast, low-cost model for routine work. +quick = ChatOpenAI(base_url=base_url, model='fast', api_key='kong') + +# Higher-capability model for complex work. Only the alias changes. +detailed = ChatOpenAI(base_url=base_url, model='smart', api_key='kong') +``` + +{:.info} +> Model aliases require {{site.ai_gateway_name}} 3.14 or later. On earlier versions, send the upstream +> model name directly, for example `model='gpt-4o'`. + +## Generate embeddings + +To embed text, point `OpenAIEmbeddings` at a Route configured with the `llm/v1/embeddings` route type: + +```python +import os +from langchain_openai import OpenAIEmbeddings + +embeddings = OpenAIEmbeddings( + base_url=f"{os.environ['KONNECT_AI_GATEWAY_URL']}/langchain-embeddings", + model='text-embedding-3-small', + api_key='kong', +) + +vector = embeddings.embed_query('Kong AI Gateway centralizes routing, auth, and observability.') +print(len(vector)) +``` + +## Set up the Kong AI Gateway Route + +If you do not already have a Route for LangChain traffic, configure one with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/) on your Kong Konnect Gateway control plane. The Plugin +owns the upstream provider credential, so the key never reaches the client. + +Export the provider key for decK to inject: + +```bash +export DECK_OPENAI_API_KEY='sk-YOUR-OPENAI-KEY' +``` + +Define a minimal chat-completions configuration in `kong.yaml`: + +{%- raw %} +```yaml +_format_version: "3.0" + +services: +- name: langchain + # Placeholder upstream; AI Proxy Advanced overrides this and calls the provider. + url: https://api.openai.com + routes: + - name: langchain + paths: + - /langchain + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: ai-proxy-advanced + config: + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o +``` +{% endraw -%} + +Sync it to your Konnect control plane: + +```bash +deck gateway sync kong.yaml \ + --konnect-addr https://us.api.konghq.com \ + --konnect-token 'kpat_YOUR-KONNECT-PAT' \ + --konnect-control-plane-name langchain +``` + +This syncs into the `langchain` Gateway control plane on Konnect. Change `--konnect-control-plane-name` +to target an existing control plane, and use `eu.api.konghq.com` or `au.api.konghq.com` if your Konnect +org is in the EU or AU region. + +The model's `base_url` is your gateway proxy URL plus this Route path, for example +`https://your-gateway-host/langchain`. LangChain appends `/chat/completions` to that base URL, which +matches the `llm/v1/chat` Route. To support [embeddings](#generate-embeddings), add a Route with the +`llm/v1/embeddings` route type. + +## Add gateway controls without changing app code + +Once the app points at Kong, platform teams can attach controls to the same Route without rewriting any +LangChain code: + +- [Key Authentication](/plugins/key-auth/) to identify the calling application. +- [Rate Limiting](/plugins/rate-limiting/) to enforce per-app request budgets. +- [AI Prompt Guard](/plugins/ai-prompt-guard/) or [AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/) to block unsafe prompts before they reach the provider. +- [AI Semantic Cache](/plugins/ai-semantic-cache/) to serve repeated prompts without another upstream call. +- [OpenTelemetry](/plugins/opentelemetry/) and logging Plugins to capture AI traffic data. + +LangChain sends its `api_key` as an `Authorization: Bearer` header. To use [Key Authentication](/plugins/key-auth/), +configure the plugin to read that header and store the Consumer credential with the `Bearer ` prefix: + +{%- raw %} +```yaml +plugins: +- name: key-auth + config: + key_names: + - Authorization +consumers: +- username: langchain-app + keyauth_credentials: + - key: Bearer my-api-key +``` +{% endraw -%} + +Then pass the key (without the prefix, which LangChain adds) from the client: + +```python +import os +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI( + base_url=f"{os.environ['KONNECT_AI_GATEWAY_URL']}/langchain", + model='gpt-4o', + api_key=os.environ['KONNECT_AI_GATEWAY_KEY'], # 'my-api-key' +) +``` + +For a full walkthrough, see [Use LangChain with AI Proxy](/how-to/use-langchain-with-ai-proxy/). + +## Troubleshooting + +**The request returns 401 from Kong.** If the Route uses Key Authentication, confirm that LangChain's +`api_key` matches the Consumer credential and that `key_names` includes `Authorization`. + +**The upstream provider returns 401.** Confirm that `DECK_OPENAI_API_KEY` holds a valid provider key and +that the AI Proxy Advanced target injects it as the `Authorization` header with the `Bearer ` prefix. + +**The request does not match a target.** Confirm that the model in `ChatOpenAI`, such as `model='fast'`, +matches a `model.model_alias` (or `model.name`) in the AI Proxy Advanced configuration. + +**Streaming buffers instead of returning tokens progressively.** Confirm that the Plugin uses +`response_streaming: allow` and that any infrastructure in front of Kong supports streaming responses. + +## Next steps + +- Follow the [Use LangChain with AI Proxy](/how-to/use-langchain-with-ai-proxy/) how-to for an end-to-end setup. +- Use the [Basic LLM Routing cookbook](/cookbooks/basic-llm-routing/) for a deeper walkthrough of model aliases. +- Add [AI Semantic Cache](/plugins/ai-semantic-cache/) to reduce repeated LLM calls. +- Review the [AI Proxy Advanced reference](/plugins/ai-proxy-advanced/) for providers, route types, and load-balancing options. diff --git a/app/_ai_integrations/llamaindex.md b/app/_ai_integrations/llamaindex.md new file mode 100644 index 0000000000..c3a91459d8 --- /dev/null +++ b/app/_ai_integrations/llamaindex.md @@ -0,0 +1,361 @@ +--- +title: LlamaIndex +description: Use LlamaIndex (Python) with {{site.ai_gateway_name}} to centralize model routing, provider credentials, authentication, and AI traffic controls. +url: "/ai-integrations/llamaindex/" +content_type: ai_integration +layout: ai_integration +products: + - ai-gateway +tools: + - deck +canonical: true +works_on: + - konnect +min_version: + gateway: '3.14' +categories: + - libraries + - frameworks +featured: true + +overview: | + [LlamaIndex](https://www.llamaindex.ai/) is a Python framework for building LLM applications over your + data: indexing, retrieval, query engines, and agents. Its OpenAI LLM can call any OpenAI-compatible + endpoint, so you can point it at a {{site.ai_gateway_name}} Route instead of calling a provider directly. + + Your LlamaIndex code keeps using `complete`, `chat`, query engines, and agents, while the gateway owns + the parts you do not want in the client: provider credentials, model selection, authentication, + observability, guardrails, rate limiting, and semantic caching. You add or change those controls at + the gateway without touching application code. +--- + +## Quick start + +Point LlamaIndex's OpenAI LLM at a {{site.ai_gateway_name}} Route running on Kong Konnect, then use +LlamaIndex exactly as you normally would. + +### Prerequisites + +- Python 3.9+. +- A [Kong Konnect](https://konnect.konghq.com) account with a Gateway control plane and a running data + plane. New to AI Gateway? Start with [Get started with AI Gateway](/ai-gateway/get-started/). +- A Route on that control plane with the [AI Proxy](/plugins/ai-proxy/) or + [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin, plus an upstream provider key held by the + Plugin. If you do not have one yet, see [Set up the Kong AI Gateway Route](#set-up-the-kong-ai-gateway-route). +- A Kong Konnect Personal Access Token (`kpat_...`) to configure the gateway with decK. + +### Install + +```bash +pip install -U llama-index-llms-openai +``` + +### Configure the LLM + +Create a shared module that builds the OpenAI LLM with `api_base` set to your {{site.ai_gateway_name}} +Route instead of the OpenAI API: + +```python +# kong_gateway.py +import os +from llama_index.llms.openai import OpenAI + +llm = OpenAI( + model="gpt-4o", + # Your Kong Konnect AI Gateway proxy URL plus the Route path, not the OpenAI API. + api_base=f"{os.environ['KONNECT_AI_GATEWAY_URL']}/llamaindex", + # The upstream provider key lives in the gateway, so this value is not used. + api_key="kong", +) +``` + +Set `KONNECT_AI_GATEWAY_URL` to your Konnect Gateway's proxy URL, the data plane endpoint that serves +your Routes: + +```bash +export KONNECT_AI_GATEWAY_URL='https://your-gateway-host' +``` + +### Call the LLM + +```python +from kong_gateway import llm + +response = llm.complete('Write a concise release note for a new AI Gateway model routing policy.') +print(response.text) +``` + +Kong receives the request, injects the real provider credential, selects the upstream model, and +returns an OpenAI-compatible response. Every other LlamaIndex feature works the same way, because the +LLM is still speaking OpenAI's chat-completion protocol to Kong. + +## Chat with messages + +```python +from llama_index.core.llms import ChatMessage +from kong_gateway import llm + +messages = [ + ChatMessage(role='system', content='You are an AI Gateway operations assistant.'), + ChatMessage(role='user', content="Summarize today's model routing changes in two lines."), +] + +print(llm.chat(messages).message.content) +``` + +## Stream responses + +Use `stream_complete` to consume tokens as they are generated: + +```python +from kong_gateway import llm + +for chunk in llm.stream_complete('Stream a short checklist for safely launching an AI feature.'): + print(chunk.delta, end='', flush=True) +``` + +## Query over your documents (RAG) + +A query engine retrieves from an index and asks the model. Both the chat and embedding calls go through +Kong, each to a Route with the matching route type: + +```bash +pip install -U llama-index llama-index-embeddings-openai +``` + +```python +import os +from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings +from llama_index.llms.openai import OpenAI +from llama_index.embeddings.openai import OpenAIEmbedding + +gateway = os.environ['KONNECT_AI_GATEWAY_URL'] + +# Chat model on the llm/v1/chat Route; embeddings on the llm/v1/embeddings Route. +Settings.llm = OpenAI(model='gpt-4o', api_base=f'{gateway}/llamaindex', api_key='kong') +Settings.embed_model = OpenAIEmbedding( + model='text-embedding-3-small', + api_base=f'{gateway}/llamaindex-embeddings', + api_key='kong', +) + +documents = SimpleDirectoryReader('docs').load_data() +index = VectorStoreIndex.from_documents(documents) +query_engine = index.as_query_engine() + +print(query_engine.query('What controls does the gateway apply to LLM traffic?')) +``` + +## Build an agent + +A `FunctionAgent` runs the model, calls your tools, and loops until it has an answer. All model and tool +traffic flows through Kong: + +```bash +pip install -U llama-index +``` + +```python +import asyncio +from llama_index.core.agent.workflow import FunctionAgent +from kong_gateway import llm + +def get_route_metrics(route_name: str) -> str: + """Return request and error counts for an AI Gateway route.""" + return f'{route_name}: 14820 requests, 0.4% error rate' + +agent = FunctionAgent( + tools=[get_route_metrics], + llm=llm, + system_prompt='You are an AI Gateway operations assistant.', +) + +async def main(): + response = await agent.run('Is the production chat route healthy?') + print(response) + +asyncio.run(main()) +``` + +## Route to multiple models + +Instead of hard-coding provider model names in your app, configure client-facing model aliases with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/). The application sends an alias such as `fast` or +`smart`, and Kong maps it to a real upstream model. You can change the upstream model, swap providers, +or add load balancing at the gateway without redeploying the app. + +Add a target per alias in your Kong configuration: + +{%- raw %} +```yaml +plugins: +- name: ai-proxy-advanced + config: + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o-mini + model_alias: fast + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o + model_alias: smart +``` +{% endraw -%} + +LlamaIndex's `OpenAI` class validates model names against known OpenAI models, so use `OpenAILike` to +send a gateway alias: + +```bash +pip install -U llama-index-llms-openai-like +``` + +```python +import os +from llama_index.llms.openai_like import OpenAILike + +base = f"{os.environ['KONNECT_AI_GATEWAY_URL']}/llamaindex" + +# Fast, low-cost model for routine work. +quick = OpenAILike(model='fast', api_base=base, api_key='kong', is_chat_model=True) + +# Higher-capability model for complex work. Only the alias changes. +detailed = OpenAILike(model='smart', api_base=base, api_key='kong', is_chat_model=True) +``` + +{:.info} +> Model aliases require {{site.ai_gateway_name}} 3.14 or later. On earlier versions, send the upstream +> model name directly, for example `OpenAI(model='gpt-4o', ...)`. + +## Set up the Kong AI Gateway Route + +If you do not already have a Route for LlamaIndex traffic, configure one with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/) on your Kong Konnect Gateway control plane. The Plugin +owns the upstream provider credential, so the key never reaches the client. + +Export the provider key for decK to inject: + +```bash +export DECK_OPENAI_API_KEY='sk-YOUR-OPENAI-KEY' +``` + +Define a minimal chat-completions configuration in `kong.yaml`: + +{%- raw %} +```yaml +_format_version: "3.0" + +services: +- name: llamaindex + # Placeholder upstream; AI Proxy Advanced overrides this and calls the provider. + url: https://api.openai.com + routes: + - name: llamaindex + paths: + - /llamaindex + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: ai-proxy-advanced + config: + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o +``` +{% endraw -%} + +Sync it to your Konnect control plane: + +```bash +deck gateway sync kong.yaml \ + --konnect-addr https://us.api.konghq.com \ + --konnect-token 'kpat_YOUR-KONNECT-PAT' \ + --konnect-control-plane-name llamaindex +``` + +This syncs into the `llamaindex` Gateway control plane on Konnect. Change `--konnect-control-plane-name` +to target an existing control plane, and use `eu.api.konghq.com` or `au.api.konghq.com` if your Konnect +org is in the EU or AU region. + +The LLM's `api_base` is your gateway proxy URL plus this Route path, for example +`https://your-gateway-host/llamaindex`. LlamaIndex appends `/chat/completions` to that base URL, which +matches the `llm/v1/chat` Route. For [RAG](#query-over-your-documents-rag), add a Route with the +`llm/v1/embeddings` route type and point `OpenAIEmbedding` at it. + +## Add gateway controls without changing app code + +Once the app points at Kong, platform teams can attach controls to the same Route without rewriting any +LlamaIndex code: + +- [Key Authentication](/plugins/key-auth/) to identify the calling application. +- [Rate Limiting](/plugins/rate-limiting/) to enforce per-app request budgets. +- [AI Prompt Guard](/plugins/ai-prompt-guard/) or [AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/) to block unsafe prompts before they reach the provider. +- [AI Semantic Cache](/plugins/ai-semantic-cache/) to serve repeated prompts without another upstream call. +- [OpenTelemetry](/plugins/opentelemetry/) and logging Plugins to capture AI traffic data. + +LlamaIndex sends its `api_key` as an `Authorization: Bearer` header. To use [Key Authentication](/plugins/key-auth/), +configure the plugin to read that header and store the Consumer credential with the `Bearer ` prefix: + +{%- raw %} +```yaml +plugins: +- name: key-auth + config: + key_names: + - Authorization +consumers: +- username: llamaindex-app + keyauth_credentials: + - key: Bearer my-api-key +``` +{% endraw -%} + +Then pass the key (without the prefix, which LlamaIndex adds) from the client: + +```python +import os +from llama_index.llms.openai import OpenAI + +llm = OpenAI( + model='gpt-4o', + api_base=f"{os.environ['KONNECT_AI_GATEWAY_URL']}/llamaindex", + api_key=os.environ['KONNECT_AI_GATEWAY_KEY'], # 'my-api-key' +) +``` + +## Troubleshooting + +**The request returns 401 from Kong.** If the Route uses Key Authentication, confirm that LlamaIndex's +`api_key` matches the Consumer credential and that `key_names` includes `Authorization`. + +**The upstream provider returns 401.** Confirm that `DECK_OPENAI_API_KEY` holds a valid provider key and +that the AI Proxy Advanced target injects it as the `Authorization` header with the `Bearer ` prefix. + +**LlamaIndex raises an "unknown model" error for an alias.** The `OpenAI` class only accepts known +OpenAI model names. Use `OpenAILike` for gateway aliases such as `fast` or `smart`. + +**Streaming buffers instead of returning tokens progressively.** Confirm that the Plugin uses +`response_streaming: allow` and that any infrastructure in front of Kong supports streaming responses. + +## Next steps + +- Use the [Basic LLM Routing cookbook](/cookbooks/basic-llm-routing/) for a deeper walkthrough of model aliases. +- Add [AI Semantic Cache](/plugins/ai-semantic-cache/) to reduce repeated LLM calls. +- Add [AI Prompt Guard](/plugins/ai-prompt-guard/) to enforce prompt policies. +- Review the [AI Proxy Advanced reference](/plugins/ai-proxy-advanced/) for providers, route types, and load-balancing options. diff --git a/app/_ai_integrations/vercel-ai-sdk.md b/app/_ai_integrations/vercel-ai-sdk.md new file mode 100644 index 0000000000..d7977e2117 --- /dev/null +++ b/app/_ai_integrations/vercel-ai-sdk.md @@ -0,0 +1,461 @@ +--- +title: Vercel AI SDK +description: Use Vercel AI SDK with {{site.ai_gateway_name}} to centralize model routing, provider credentials, authentication, and AI traffic controls. +url: "/ai-integrations/vercel-ai-sdk/" +content_type: ai_integration +layout: ai_integration +products: + - ai-gateway +tools: + - deck +canonical: true +works_on: + - konnect +min_version: + gateway: '3.14' +categories: + - libraries + - frameworks +featured: true + +overview: | + [Vercel AI SDK](https://ai-sdk.dev/) is a TypeScript toolkit for building AI applications with a + single, provider-agnostic API for text generation, streaming, structured outputs, tools, and agents. + Because its OpenAI provider can call any OpenAI-compatible endpoint, you can point it at a + {{site.ai_gateway_name}} Route instead of calling a provider directly. + + Your application code stays focused on `generateText`, `streamText`, `generateObject`, tools, and + agents, while the gateway owns the parts you do not want in the client: provider credentials, model + selection, authentication, observability, guardrails, rate limiting, and semantic caching. You add or + change those controls at the gateway without touching application code. +--- + +## Quick start + +Point Vercel AI SDK's OpenAI provider at a {{site.ai_gateway_name}} Route running on Kong Konnect, then +use the SDK exactly as you normally would. + +### Prerequisites + +- Node.js and a project with Vercel AI SDK installed. +- A [Kong Konnect](https://konnect.konghq.com) account with a Gateway control plane and a running data + plane. New to AI Gateway? Start with [Get started with AI Gateway](/ai-gateway/get-started/). +- A Route on that control plane with the [AI Proxy](/plugins/ai-proxy/) or + [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin, plus an upstream provider key held by the + Plugin. If you do not have one yet, see [Set up the Kong AI Gateway Route](#set-up-the-kong-ai-gateway-route). +- A Kong Konnect Personal Access Token (`kpat_...`) to configure the gateway with decK. + +### Install + +```bash +npm install ai @ai-sdk/openai zod +``` + +### Configure the provider + +Create the OpenAI provider with `createOpenAI`, but set `baseURL` to your {{site.ai_gateway_name}} +Route instead of the OpenAI API: + +```ts +import { createOpenAI } from '@ai-sdk/openai'; + +export const kong = createOpenAI({ + // Your Kong Konnect AI Gateway proxy URL plus the Route path, not the OpenAI API. + baseURL: `${process.env.KONNECT_AI_GATEWAY_URL}/vercel-ai-sdk`, + // The upstream provider key lives in the gateway, so this value is not used. + apiKey: 'kong', +}); +``` + +Set `KONNECT_AI_GATEWAY_URL` to your Konnect Gateway's proxy URL, the data plane endpoint that serves +your Routes: + +```bash +KONNECT_AI_GATEWAY_URL='https://your-gateway-host' +``` + +Kong receives the request, injects the real provider credential, selects the upstream model, and +returns an OpenAI-compatible response to the SDK. + +{:.info} +> Vercel AI SDK 5 and later call the OpenAI Responses API by default when you use `kong('model')`. +> The examples on this page use `kong.chat('model')` so the SDK sends chat-completion requests that +> match the `llm/v1/chat` route type in Kong. To use the Responses API instead, see +> [Use the Responses API](#use-the-responses-api). + +### Generate text + +```ts +import { generateText } from 'ai'; +import { kong } from './kong-ai-gateway'; + +const { text, usage } = await generateText({ + model: kong.chat('gpt-4o'), + prompt: 'Write a concise release note for a new AI Gateway model routing policy.', +}); + +console.log(text); +console.log(usage); +``` + +That is the whole integration. Every other Vercel AI SDK feature works the same way, because the SDK is +still speaking OpenAI's chat-completion protocol to Kong. + +## Stream text + +Stream tokens as they are generated with `streamText`: + +```ts +import { streamText } from 'ai'; +import { kong } from './kong-ai-gateway'; + +const result = streamText({ + model: kong.chat('gpt-4o'), + prompt: 'Stream a short checklist for safely launching an AI feature.', +}); + +for await (const delta of result.textStream) { + process.stdout.write(delta); +} +``` + +In a Next.js Route Handler, return the stream directly to the browser: + +```ts +import { streamText } from 'ai'; +import { kong } from '@/lib/kong-ai-gateway'; + +export async function POST(req: Request) { + const { messages } = await req.json(); + + const result = streamText({ + model: kong.chat('gpt-4o'), + messages, + }); + + return result.toTextStreamResponse(); +} +``` + +## Generate structured data + +Use `generateObject` with a Zod schema to get validated, typed output. Kong stays on the request path +while the SDK enforces the schema: + +```ts +import { generateObject } from 'ai'; +import { z } from 'zod'; +import { kong } from './kong-ai-gateway'; + +const { object } = await generateObject({ + model: kong.chat('gpt-4o'), + schema: z.object({ + title: z.string(), + risks: z.array(z.string()), + rolloutSteps: z.array(z.string()), + }), + prompt: 'Create a launch plan for adding semantic caching to an AI product.', +}); + +console.log(object); +``` + +## Use tools + +Tool calling works through Kong whenever the upstream model supports it: + +```ts +import { generateText, tool } from 'ai'; +import { z } from 'zod'; +import { kong } from './kong-ai-gateway'; + +const result = await generateText({ + model: kong.chat('gpt-4o'), + tools: { + getGatewayPolicy: tool({ + description: 'Return the policy status for an AI Gateway route.', + inputSchema: z.object({ + routeName: z.string().describe('The name of the AI Gateway route'), + }), + execute: async ({ routeName }) => ({ + routeName, + auth: 'enabled', + semanticCache: 'enabled', + guardrails: 'enabled', + }), + }), + }, + prompt: 'Check the AI Gateway policy for the production chat route.', +}); + +console.log(result.text); +console.log('Tool calls:', result.toolCalls); +``` + +## Build an agent + +The SDK's agent loop runs the model, calls your tools, and feeds the results back until it reaches a +stopping condition. The model and tool traffic all flows through Kong: + +```ts +import { Experimental_Agent as Agent, tool, stepCountIs } from 'ai'; +import { z } from 'zod'; +import { kong } from './kong-ai-gateway'; + +const gatewayAgent = new Agent({ + model: kong.chat('gpt-4o'), + tools: { + getRouteMetrics: tool({ + description: 'Return request and error counts for an AI Gateway route.', + inputSchema: z.object({ + routeName: z.string(), + }), + execute: async ({ routeName }) => ({ + routeName, + requests: 14820, + errorRate: 0.004, + }), + }), + }, + stopWhen: stepCountIs(10), +}); + +const result = await gatewayAgent.generate({ + prompt: 'Is the production chat route healthy? Summarize its request volume and error rate.', +}); + +console.log(result.text); +console.log('Steps taken:', result.steps.length); +``` + +## Route to multiple models + +Instead of hard-coding provider model names in your app, configure client-facing model aliases with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/). The application sends an alias such as `fast` or +`smart`, and Kong maps it to a real upstream model. You can change the upstream model, swap providers, +or add load balancing at the gateway without redeploying the app. + +Add a target per alias in your Kong configuration: + +{%- raw %} +```yaml +plugins: +- name: ai-proxy-advanced + config: + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o-mini + model_alias: fast + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o + model_alias: smart +``` +{% endraw -%} + +Then select a model by alias in the SDK: + +```ts +import { generateText } from 'ai'; +import { kong } from './kong-ai-gateway'; + +// Fast, low-cost model for routine work. +const quick = await generateText({ + model: kong.chat('fast'), + prompt: 'Write a one-line summary of this changelog entry.', +}); + +// Higher-capability model for complex work. Only the alias changes. +const detailed = await generateText({ + model: kong.chat('smart'), + prompt: 'Compare three rollout strategies for a production AI application.', +}); +``` + +{:.info} +> Model aliases require {{site.ai_gateway_name}} 3.14 or later. On earlier versions, send the upstream +> model name directly, for example `kong.chat('gpt-4o')`. + +## Pass custom parameters + +Standard generation parameters pass straight through to the upstream model: + +```ts +import { generateText } from 'ai'; +import { kong } from './kong-ai-gateway'; + +const { text } = await generateText({ + model: kong.chat('gpt-4o'), + temperature: 0.3, + maxOutputTokens: 512, + maxRetries: 5, + prompt: 'Draft a short incident summary for an AI Gateway latency spike.', +}); + +console.log(text); +``` + +## Generate images + +To generate images, point the SDK's image model at a Route configured with the +`image/v1/images/generations` route type, then call `experimental_generateImage`: + +```ts +import { experimental_generateImage as generateImage } from 'ai'; +import { kong } from './kong-ai-gateway'; + +const { image } = await generateImage({ + model: kong.image('dall-e-3'), + prompt: 'A clean isometric diagram of an API gateway routing traffic to several AI models', + size: '1024x1024', +}); + +console.log(image.base64); +``` + +## Generate embeddings + +To create embeddings, point the SDK at a Route configured with the `llm/v1/embeddings` route type, then +use `embed` or `embedMany`: + +```ts +import { embed } from 'ai'; +import { kong } from './kong-ai-gateway'; + +const { embedding } = await embed({ + model: kong.embedding('text-embedding-3-small'), + value: 'Kong AI Gateway centralizes routing, auth, and observability for LLM traffic.', +}); + +console.log(embedding.length); +``` + +## Set up the Kong AI Gateway Route + +If you do not already have a Route for Vercel AI SDK traffic, configure one with +[AI Proxy Advanced](/plugins/ai-proxy-advanced/) on your Kong Konnect Gateway control plane. The Plugin +owns the upstream provider credential, so the key never reaches the client. + +Export the provider key for decK to inject: + +```bash +export DECK_OPENAI_API_KEY='sk-YOUR-OPENAI-KEY' +``` + +Define a minimal chat-completions configuration in `kong.yaml`: + +{%- raw %} +```yaml +_format_version: "3.0" + +services: +- name: vercel-ai-sdk + # Placeholder upstream; AI Proxy Advanced overrides this and calls the provider. + url: https://api.openai.com + routes: + - name: vercel-ai-sdk + paths: + - /vercel-ai-sdk + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: ai-proxy-advanced + config: + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: 'Bearer ${{ env "DECK_OPENAI_API_KEY" }}' + model: + provider: openai + name: gpt-4o +``` +{% endraw -%} + +Sync it to your Konnect control plane: + +```bash +deck gateway sync kong.yaml \ + --konnect-addr https://us.api.konghq.com \ + --konnect-token 'kpat_YOUR-KONNECT-PAT' \ + --konnect-control-plane-name vercel-ai-sdk +``` + +This syncs into the `vercel-ai-sdk` Gateway control plane on Konnect. Change `--konnect-control-plane-name` +to target an existing control plane, and use `eu.api.konghq.com` or `au.api.konghq.com` if your Konnect +org is in the EU or AU region. + +The SDK's `baseURL` is your gateway proxy URL plus this Route path, for example +`https://your-gateway-host/vercel-ai-sdk`. The OpenAI provider appends `/chat/completions` to that base +URL, which matches the `llm/v1/chat` Route. To support [images](#generate-images) or +[embeddings](#generate-embeddings), add Routes with the `image/v1/images/generations` and +`llm/v1/embeddings` route types. + +## Add gateway controls without changing app code + +Once the app points at Kong, platform teams can attach controls to the same Route without rewriting any +Vercel AI SDK calls: + +- [Key Authentication](/plugins/key-auth/) to identify the calling application with an `apikey` header. +- [Rate Limiting](/plugins/rate-limiting/) to enforce per-app request budgets. +- [AI Prompt Guard](/plugins/ai-prompt-guard/) or [AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/) to block unsafe prompts before they reach the provider. +- [AI Semantic Cache](/plugins/ai-semantic-cache/) to serve repeated prompts without another upstream call. +- [OpenTelemetry](/plugins/opentelemetry/) and logging Plugins to capture AI traffic data. + +If you add Key Authentication, send the Consumer key from the SDK with the `apikey` header. The upstream +provider key still stays in Kong: + +```ts +import { createOpenAI } from '@ai-sdk/openai'; + +export const kong = createOpenAI({ + baseURL: `${process.env.KONNECT_AI_GATEWAY_URL}/vercel-ai-sdk`, + apiKey: 'kong', // Not used by Kong; the provider key lives in the gateway. + headers: { + apikey: process.env.KONNECT_AI_GATEWAY_KEY ?? '', // Kong Consumer key for Key Authentication. + }, +}); +``` + +The client keeps calling `kong.chat('fast')` or `kong.chat('smart')`. Kong applies the production +controls at the gateway layer. For a full walkthrough, see +[Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/). + +## Use the Responses API + +If your app uses the OpenAI Responses API by calling `kong('model')` instead of `kong.chat('model')`, +configure an AI Proxy Advanced target with the `llm/v1/responses` route type and point the SDK at that +Route. Use `kong.chat('model')` for the chat-completions setup shown throughout this guide. + +## Troubleshooting + +**The SDK returns 401 from Kong.** If the Route uses Key Authentication, confirm that the `apikey` header +carries a valid Kong Consumer key. + +**The upstream provider returns 401.** Confirm that `DECK_OPENAI_API_KEY` holds a valid provider key and +that the AI Proxy Advanced target injects it as the `Authorization` header with the `Bearer ` prefix. + +**The request does not match a target.** Confirm that the model in the SDK, such as `kong.chat('fast')`, +matches a `model.model_alias` (or `model.name`) in the AI Proxy Advanced configuration. + +**Streaming buffers instead of returning tokens progressively.** Confirm that the Plugin uses +`response_streaming: allow` and that any infrastructure in front of Kong supports streaming responses. + +## Next steps + +- Use the [Basic LLM Routing cookbook](/cookbooks/basic-llm-routing/) for a deeper walkthrough of model aliases. +- Add [AI Semantic Cache](/plugins/ai-semantic-cache/) to reduce repeated LLM calls. +- Add [AI Prompt Guard](/plugins/ai-prompt-guard/) to enforce prompt policies. +- Review the [AI Proxy Advanced reference](/plugins/ai-proxy-advanced/) for providers, route types, and load-balancing options. diff --git a/app/_assets/entrypoints/ai-integrations.js b/app/_assets/entrypoints/ai-integrations.js new file mode 100644 index 0000000000..2e0998e8ed --- /dev/null +++ b/app/_assets/entrypoints/ai-integrations.js @@ -0,0 +1,136 @@ +class AIIntegrationsIndex { + constructor() { + this.heroSearch = document.getElementById("ai-integrations-search-hero"); + this.listSearch = document.getElementById("ai-integrations-search-list"); + this.heroPills = document.querySelectorAll("#hero-category-pills button"); + this.categoryTabs = document.querySelectorAll("#category-tabs button"); + this.allIntegrationsSection = document.getElementById("all-integrations-section"); + this.integrationList = document.getElementById("ai-integration-list"); + this.emptyState = document.getElementById("ai-integration-empty"); + this.rows = this.integrationList.querySelectorAll('[data-card="ai-integration-row"]'); + + this.activeCategory = "all"; + this.searchQuery = ""; + + this.addEventListeners(); + this.readURL(); + this.filterList(); + } + + addEventListeners() { + [this.heroSearch, this.listSearch].forEach((input) => { + input.addEventListener("input", () => { + this.searchQuery = input.value; + if (input === this.heroSearch) this.listSearch.value = input.value; + else this.heroSearch.value = input.value; + this.filterList(); + this.updateURL(); + }); + }); + + this.heroSearch.addEventListener("keydown", (event) => { + if (event.key === "Enter") { + event.preventDefault(); + this.allIntegrationsSection.scrollIntoView({ behavior: "smooth" }); + } + }); + + this.heroPills.forEach((pill) => { + pill.addEventListener("click", () => { + this.setCategory(pill.dataset.category); + this.allIntegrationsSection.scrollIntoView({ behavior: "smooth" }); + }); + }); + + this.categoryTabs.forEach((tab) => { + tab.addEventListener("click", () => { + this.setCategory(tab.dataset.category); + }); + }); + } + + setCategory(slug) { + this.activeCategory = slug; + this.updateTabsUI(); + this.filterList(); + this.updateURL(); + } + + updateTabsUI() { + this.categoryTabs.forEach((tab) => { + tab.classList.toggle( + "tab-button__horizontal--active", + tab.dataset.category === this.activeCategory + ); + }); + + this.heroPills.forEach((pill) => { + const active = pill.dataset.category === this.activeCategory; + pill.classList.toggle("bg-brand-saturated/20", active); + pill.classList.toggle("border-brand-saturated", active); + }); + } + + filterList() { + const query = this.searchQuery.toLowerCase().trim(); + let count = 0; + + this.rows.forEach((row) => { + const categories = (row.dataset.categories || "") + .split(",") + .map((value) => value.trim()) + .filter(Boolean); + const matchesCat = + this.activeCategory === "all" || + categories.includes(this.activeCategory) || + row.dataset.category === this.activeCategory; + const matchesSearch = + !query || + row.dataset.title.includes(query) || + row.dataset.description.includes(query); + const show = matchesCat && matchesSearch; + row.classList.toggle("hidden", !show); + if (show) count++; + }); + + this.emptyState.classList.toggle("hidden", count > 0); + } + + updateURL() { + const params = new URLSearchParams(); + if (this.activeCategory !== "all") { + params.set("category", this.activeCategory); + } + if (this.searchQuery) { + params.set("q", this.searchQuery); + } + const newUrl = + window.location.pathname + + (params.toString() ? "?" + params.toString() : ""); + window.history.replaceState({}, "", newUrl); + } + + readURL() { + const params = new URLSearchParams(window.location.search); + const cat = params.get("category"); + if (cat) { + const validCategories = new Set( + Array.from(this.categoryTabs).map((tab) => tab.dataset.category) + ); + this.activeCategory = validCategories.has(cat) ? cat : "all"; + } + const q = params.get("q"); + if (q) { + this.searchQuery = q; + this.heroSearch.value = q; + this.listSearch.value = q; + } + this.updateTabsUI(); + + if (cat || q) { + this.allIntegrationsSection.scrollIntoView({ behavior: "smooth" }); + } + } +} + +document.addEventListener("DOMContentLoaded", () => new AIIntegrationsIndex()); diff --git a/app/_data/ai_integration_categories.yaml b/app/_data/ai_integration_categories.yaml new file mode 100644 index 0000000000..5e3f156849 --- /dev/null +++ b/app/_data/ai_integration_categories.yaml @@ -0,0 +1,9 @@ +- text: Libraries + slug: libraries + icon: "/assets/icons/code.svg" +- text: Frameworks + slug: frameworks + icon: "/assets/icons/brain.svg" +- text: Agents + slug: agents + icon: "/assets/icons/sparkle.svg" diff --git a/app/_data/schemas/frontmatter/base.json b/app/_data/schemas/frontmatter/base.json index 754f9261f7..39d31b1eb2 100644 --- a/app/_data/schemas/frontmatter/base.json +++ b/app/_data/schemas/frontmatter/base.json @@ -18,7 +18,7 @@ }, "content_type": { "type": "string", - "enum": ["landing_page", "how_to", "reference", "concept", "plugin", "plugin_example", "api", "policy", "support", "cookbook", "skill"] + "enum": ["landing_page", "how_to", "reference", "concept", "plugin", "plugin_example", "api", "policy", "support", "cookbook", "skill", "ai_integration"] }, "description": { "type": "string" diff --git a/app/_data/top_navigation.yml b/app/_data/top_navigation.yml index b3adf4080e..3df4bb0fd2 100644 --- a/app/_data/top_navigation.yml +++ b/app/_data/top_navigation.yml @@ -130,6 +130,9 @@ ai_tools: - text: AI Cookbooks url: /cookbooks/ icon: /assets/icons/book.svg + - text: AI Integrations + url: /ai-integrations/ + icon: /assets/icons/code.svg - text: Skills Hub url: /skills/ icon: /assets/icons/brain.svg diff --git a/app/_includes/ai_integrations/overview.html b/app/_includes/ai_integrations/overview.html new file mode 100644 index 0000000000..6a0262a0d6 --- /dev/null +++ b/app/_includes/ai_integrations/overview.html @@ -0,0 +1,6 @@ +
+ Guides for using popular AI libraries, frameworks, and SDKs with Kong AI Gateway + for centralized LLM routing, authentication, observability, guardrails, and cost controls. +
+