diff --git a/app/_cookbooks/basic-llm-routing.md b/app/_cookbooks/basic-llm-routing.md index 90ac5af9b8..cea4f54b84 100644 --- a/app/_cookbooks/basic-llm-routing.md +++ b/app/_cookbooks/basic-llm-routing.md @@ -1,6 +1,6 @@ --- title: Basic LLM Routing -description: Route requests to any supported LLM provider through {{site.ai_gateway_name}} with Consumer authentication and per-request model selection. +description: Route requests to any supported LLM provider through Kong AI Gateway with Consumer authentication and per-request model selection. url: "/cookbooks/basic-llm-routing/" content_type: cookbook layout: cookbook @@ -15,7 +15,6 @@ min_version: gateway: '3.14' categories: - llm - - llm-routing featured: false popular: false @@ -37,7 +36,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. You will provision a recipe-scoped Control Plane and local Data Plane via the [quickstart script](https://get.konghq.com/quickstart). @@ -60,7 +59,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -205,7 +204,7 @@ quotas, or routing policy. {% mermaid %} sequenceDiagram participant C as Client - participant K as {{site.ai_gateway_name}} + participant K as Kong AI Gateway participant L as LLM Provider C->>K: POST /basic-llm-routing (apikey, model: fast or smart) @@ -265,7 +264,7 @@ The Key Auth Plugin sits in front of the AI Proxy Advanced Plugin and gates ever ``` {:.no-copy-code} -**`key_names: [apikey]`**. The headers (or query parameters) the Plugin looks in for the API key. The recipe uses `apikey` because the Key Auth Plugin performs an exact string match on the header value and does not inspect `Authorization` for Bearer tokens. The OpenAI SDK's `api_key` field always serializes as `Authorization: Bearer `, which Kong would read as the literal string `Bearer ` and fail to match against any stored credential. The "Try it out" section below points at a pre-function pattern that bridges the SDK's Bearer token to the `apikey` header server-side; the [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) guide has the full pattern. +**`key_names: [apikey]`**. The headers (or query parameters) the Plugin looks in for the API key. The recipe uses `apikey` because the Key Auth Plugin performs an exact string match on the header value and does not inspect `Authorization` for Bearer tokens. The OpenAI SDK's `api_key` field always serializes as `Authorization: Bearer `, which Kong would read as the literal string `Bearer ` and fail to match against any stored credential. The "Try it out" section below points at a pre-function pattern that bridges the SDK's Bearer token to the `apikey` header server-side; the [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) guide has the full pattern. **`hide_credentials: true`**. Strips the API key from the request before forwarding upstream. The provider never sees the Consumer's API key. This is a 3.14 default but the recipe sets it explicitly for clarity and to remain portable to older Gateway versions. @@ -333,7 +332,7 @@ The AI Proxy Advanced Plugin sits behind the Key Auth Plugin and handles everyth - **Additional route types.** A single Plugin instance can have multiple targets for different route types, each with their own model and auth configuration. Beyond `llm/v1/chat`, the Plugin supports additional route types for embeddings, completions, responses, realtime, and multimodal traffic. See the [ai-proxy-advanced reference](/plugins/ai-proxy-advanced/) for the current list. {:.info} -> **Production credentials.** This recipe stores the Consumer API key directly in Plugin config and the LLM provider credentials in environment variables for simplicity. In production, use [Kong Vaults](/gateway/secrets-management/) to reference both from your preferred secret manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault) instead. +> **Production credentials.** This recipe stores the Consumer API key directly in Plugin config and the LLM provider credentials in environment variables for simplicity. In production, use [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) to reference both from your preferred secret manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault) instead. ### Example response @@ -992,7 +991,7 @@ rm -f kong-recipe.yaml The demo script makes three calls. The first two send the same prompt with different `model` values (`fast` then `smart`) and print the `X-Kong-LLM-Model` header so you can confirm Kong routed each request to a different upstream model. The third call presents an invalid API key and shows Kong rejecting it with `401` before any upstream call. {:.info} -> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. +> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. Create the demo script: diff --git a/app/_cookbooks/claude-code-sso.md b/app/_cookbooks/claude-code-sso.md index 5b022c661b..245936d504 100644 --- a/app/_cookbooks/claude-code-sso.md +++ b/app/_cookbooks/claude-code-sso.md @@ -16,7 +16,7 @@ min_version: categories: - access-control - llm -featured: true +featured: false popular: false # Machine-readable fields for AI agent setup @@ -41,7 +41,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. @@ -68,7 +68,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration, plus [jq](https://jqlang.org/) for JSON processing in the apply and cleanup steps. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 2. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 3. Install **jq** from [jqlang.org](https://jqlang.org/). @@ -150,7 +150,7 @@ prereqs: ``` {:.warning} - > **`apiKeyHelper` is bypassed if `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` is set in your environment.** Both env vars take precedence over the helper per Claude Code's credential precedence rules. If either is set in your shell profile, unset it before running Claude Code with this recipe: `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN`. + > **`apiKeyHelper` is bypassed if `ANTHROPIC_API_KEY` or `ANTHROPIC_AUTH_TOKEN` is set in your environment.** Both env vars take precedence over the helper per Claude Code's [credential precedence rules](https://code.claude.com/docs/en/iam). If either is set in your shell profile, unset it before running Claude Code with this recipe: `unset ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN`. **Helper script: okta-claude-auth.sh** @@ -540,7 +540,7 @@ sequenceDiagram participant CC as Claude Code participant H as apiKeyHelper script participant O as Okta - participant K as {{site.base_gateway}} + participant K as Kong Gateway participant L as LLM Provider CC->>H: Request bearer token @@ -896,7 +896,7 @@ When the token limit is exceeded, Kong returns `429 Too Many Requests` with a `R header. {:.info} -> The 60-second windows here are intentionally aggressive for the demo so a few interactive prompts visibly exhaust the budget. Most teams enforce monthly or daily token budgets in production, for example {%raw%}`limits: [{limit: 5000000, window_size: 2592000}]`{%endraw%}. Combine with [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references for credentials in production rather than environment variables. +> The 60-second windows here are intentionally aggressive for the demo so a few interactive prompts visibly exhaust the budget. Most teams enforce monthly or daily token budgets in production, for example {%raw%}`limits: [{limit: 5000000, window_size: 2592000}]`{%endraw%}. Combine with [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references for credentials in production rather than environment variables. ## Apply the Kong configuration diff --git a/app/_cookbooks/github-copilot-byok.md b/app/_cookbooks/github-copilot-byok.md new file mode 100644 index 0000000000..d7893e903f --- /dev/null +++ b/app/_cookbooks/github-copilot-byok.md @@ -0,0 +1,1302 @@ +--- +title: GitHub Copilot BYOK Custom Endpoint +description: Route GitHub Copilot Chat and agent traffic through {{site.base_gateway}} via Copilot's Bring Your Own Key Custom Endpoint, with per-Consumer attribution, request normalization, and token rate limiting. +url: "/cookbooks/github-copilot-byok/" +content_type: cookbook +layout: cookbook +products: + - ai-gateway +tools: + - kongctl +canonical: true +works_on: + - konnect +min_version: + gateway: '3.14' +categories: + - llm + - access-control +featured: false +popular: false + +# Machine-readable fields for AI agent setup +plugins: + - pre-function + - key-auth + - ai-proxy-advanced + - ai-rate-limiting-advanced +requires_embeddings: false +providers: + - openai + - azure + - bedrock +extra_services: + - name: GitHub Copilot Business or Enterprise + env_vars: [] + hint: "Each developer needs a GitHub Copilot Business or Enterprise seat, and the org administrator must enable the 'Bring Your Own Language Model Key in VS Code' Copilot policy plus Editor Preview Features. See the GitHub Copilot section in Prerequisites." + +hint: "Requires GitHub Copilot Business or Enterprise seats, VS Code 1.122 or later, an admin who can enable the BYOK Copilot policy, and one LLM provider credential (OpenAI, Azure OpenAI, or AWS Bedrock)." + +prereqs: + skip_product: true + skip_tool: true + inline: + - title: Kong Konnect + content: | + This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. + + 1. Create a new personal access token by opening the [Konnect PAT page](https://cloud.konghq.com/global/account/tokens) and selecting **Generate Token**. + 1. Export your token. The same token is reused later for kongctl commands: + + ```bash + export KONNECT_TOKEN='YOUR_KONNECT_PAT' + ``` + + 1. Set the recipe-scoped Control Plane name and run the quickstart script. The `-e` flags raise the data plane's nginx body buffer so Copilot's large request payloads (full conversation context, tool definitions, file contents) stay in memory instead of spilling to disk: + + ```bash + export KONNECT_CONTROL_PLANE_NAME='github-copilot-byok-recipe' + curl -Ls https://get.konghq.com/quickstart | \ + bash -s -- -k $KONNECT_TOKEN \ + -e KONG_NGINX_HTTP_CLIENT_BODY_BUFFER_SIZE=16m \ + -e KONG_NGINX_HTTP_CLIENT_MAX_BODY_SIZE=16m \ + --deck-output + ``` + + This provisions a {{site.konnect_product_name}} Control Plane named `github-copilot-byok-recipe`, a local Data Plane connected to it, and prints `export` lines for the rest of the session vars. Paste those into your shell when prompted. + - title: kongctl + decK + jq + content: | + This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration, plus [jq](https://jqlang.org/) for JSON processing in the apply and cleanup steps. + + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). + 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). + 1. Install **jq** from [jqlang.org](https://jqlang.org/). + + You can verify all three are installed: + + ```bash + kongctl version + deck version + jq --version + ``` + - title: GitHub Copilot in VS Code + content: | + This recipe routes GitHub Copilot Chat and agent traffic through {{site.base_gateway}} using Copilot's **Bring Your Own Key Custom Endpoint**. The Custom Endpoint feature is currently in stable VS Code on Copilot Enterprise seats and in preview on Copilot Business seats. Verify the current availability matrix at the [VS Code language models docs](https://code.visualstudio.com/docs/copilot/customization/language-models) before you start. + + 1. Install **VS Code** version 1.122 or later from [code.visualstudio.com](https://code.visualstudio.com/). + 1. Install the **GitHub Copilot** and **GitHub Copilot Chat** extensions and sign in with a GitHub account that has a Copilot Business or Enterprise seat. + 1. Have your **Copilot administrator** enable both of the following in the GitHub organization's Copilot policy settings: + - **Bring Your Own Language Model Key in VS Code** (the BYOK policy gate). + - **Editor Preview Features** (required as long as Custom Endpoint is preview on your seat type). + + Without these, the VS Code Chat picker will not surface the Custom Endpoint option, or will accept the configuration and reject every request at runtime. + + 1. Generate two Consumer credentials. These become the API key values each developer enters into the VS Code **Add Models...** wizard. Export them now so the apply step picks them up: + + ```bash + export DECK_COPILOT_KEY_ALICE="$(openssl rand -hex 24)" + export DECK_COPILOT_KEY_BOB="$(openssl rand -hex 24)" + ``` + + When you later paste a key into VS Code, copy it with `printf '%s' "$DECK_COPILOT_KEY_ALICE"` rather than `echo "$DECK_COPILOT_KEY_ALICE"`. In zsh, output that lacks a trailing newline is rendered with a bold `%`. That character is not part of the key, and pasting it produces a `401 No API key found in request` from Key Auth. + + In production, generate one credential per developer (or one per VS Code workstation) and distribute through your secrets manager rather than shell exports. + + {:.warning} + > **Inline ghost-text completions cannot be proxied.** Copilot's autocomplete (the gray suggestion text that appears as you type) always runs on GitHub's infrastructure and is not affected by the Custom Endpoint setting. The recipe governs Copilot Chat and agent (`@workspace`, `#editor`, Copilot CLI) traffic only. + - title: AI Credentials + content: | + {% navtabs "Providers" %} + {% navtab "OpenAI" %} + This tutorial uses OpenAI: + + 1. [Create an OpenAI account](https://platform.openai.com/). + 1. [Get an API key](https://platform.openai.com/api-keys). + 1. Create a decK variable for the API key. The `openai` provider expects the full `Authorization` value, including the `Bearer ` prefix: + + ```bash + export DECK_OPENAI_TOKEN='Bearer sk-YOUR-OPENAI-KEY' + ``` + {% endnavtab %} + {% navtab "Azure OpenAI" %} + This tutorial uses [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/): + + 1. Ensure you have an Azure OpenAI resource with at least one chat deployment. + 1. Create decK variables for your resource. `DECK_AZURE_DEPLOYMENT_ID` is the deployment name you assigned in the Azure portal, not the underlying model name: + + ```bash + export DECK_AZURE_API_KEY='YOUR-AZURE-OPENAI-KEY' + export DECK_AZURE_INSTANCE='your-azure-resource' + export DECK_AZURE_DEPLOYMENT_ID='your-deployment-name' + export DECK_AZURE_API_VERSION='2024-10-21' + ``` + {% endnavtab %} + {% navtab "AWS Bedrock" %} + This tutorial uses [AWS Bedrock](https://docs.aws.amazon.com/bedrock/): + + 1. Ensure you have an AWS account with [Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) enabled for the chat model you plan to use. + 1. Create decK variables with your AWS credentials: + + ```bash + export DECK_AWS_ACCESS_KEY_ID='your-access-key' + export DECK_AWS_SECRET_ACCESS_KEY='your-secret-key' + export DECK_AWS_REGION='us-east-1' + ``` + {% endnavtab %} + {% endnavtabs %} + +overview: | + This recipe puts {{site.ai_gateway_name}} in front of an LLM provider so every + GitHub Copilot Chat and agent request from your engineering team flows through a + control point you own. Kong holds the provider API key and injects it server-side, + authenticates each developer with a Consumer-scoped credential, normalizes the + request body so reasoning models accept it, and attributes every token to a named + developer in {{site.konnect_short_name}} Analytics. + + The recipe configures Copilot's + [Bring Your Own Key (BYOK) Custom Endpoint](https://code.visualstudio.com/docs/copilot/customization/language-models) + in VS Code to post to a Kong Route at `/github-copilot/v1/chat/completions`, then + uses four Kong Plugins on that Route: the [Pre-function](/plugins/pre-function/) + Plugin to normalize the request, the [Key Auth](/plugins/key-auth/) Plugin to + authenticate each developer, the [AI Proxy Advanced](/plugins/ai-proxy-advanced/) + Plugin to inject the provider key and forward the request, and the + [AI Rate Limiting Advanced](/plugins/ai-rate-limiting-advanced/) Plugin to enforce + per-Consumer token budgets. + + Scope: this recipe governs Copilot Chat and agent traffic (the Chat view, agent + modes, and Copilot CLI). Inline ghost-text code completions always run on + GitHub's infrastructure and cannot be proxied through Kong, regardless of the + Custom Endpoint setting. +--- + +## The problem + +Routing GitHub Copilot through a central control point breaks down on three independent fronts at once. Each one is a real wall teams hit; together they are why naive provider-credential reuse falls apart. + +**Credentials live on developer machines.** A team that enables Copilot BYOK without a gateway distributes provider API keys directly to every engineer through 1Password, shell profiles, or chat. Every request is indistinguishable on the provider bill, every leaked key is a fleet-wide rotation, and there is no way to revoke one developer without touching everyone. + +**Copilot pins request fields that some upstreams reject.** VS Code's Custom Endpoint client posts requests with `temperature: 0.1` hard-coded into the body, and Copilot's BYOK schema does not expose a field for changing it. When the upstream is a reasoning model (gpt-5 family, o-series), the provider returns: + +```text +400 ... 'temperature' does not support 0.1 ... Only the default (1) value is supported +``` +{:.no-copy-code} + +There is nowhere in VS Code's BYOK configuration to fix this. The request has to be normalized at the gateway. + +**Org-policy model allowlists reject the wrong name.** GitHub Copilot Business and Enterprise let admins pin an allowlist of model IDs that BYOK requests are permitted to claim. A request that names any other model is rejected by Copilot before it ever leaves VS Code, with: + +```text +400 "cannot use own model - must be: gpt-5-mini" +``` +{:.no-copy-code} + +This is enforced by GitHub's policy layer, not by the LLM provider or by Kong. It means the model ID a developer types into VS Code (`models[].id` in the BYOK config) has to match the org's allowlist, regardless of what model Kong actually forwards to upstream. + +**No per-developer attribution.** Even when traffic reaches the provider, the bill is a single line item. There is no built-in way to see which developer used how many tokens, which model, or how often a developer is approaching the team's monthly budget. Cost decisions become retroactive. + +The root issue is that trust, normalization, and accounting all need to happen on a server you control before the request hits the provider. That control point is what this recipe builds. + +## The solution + +This recipe puts {{site.base_gateway}} between VS Code and the LLM provider so every Copilot Chat or agent request flows through a server you control: + +- **Developers never hold the provider key.** Kong holds it and injects it server-side. Each developer authenticates to Kong with their own short-lived Consumer credential, which is the value they paste into VS Code's `apiKey` field. Rotating one developer is a single deck-config change; rotating the provider key is also one change, not a fleet-wide push. +- **Per-Consumer attribution.** Every request is attributed to a named Consumer in {{site.konnect_short_name}} Analytics, so cost, token usage, and model mix can be sliced per developer. +- **Server-side request normalization.** A Pre-function Plugin instance runs ahead of every other Plugin on the Route and rewrites two things in the request before any other Plugin sees it: it converts Copilot's `Authorization: Bearer ` header into the `apikey` header that the Key Auth Plugin expects, and it strips `temperature` from the JSON body so reasoning models accept the request. +- **Token-budget rate limiting.** Each Consumer has a per-minute token budget enforced by the AI Rate Limiting Advanced Plugin. Exhaustion returns `429 Too Many Requests` until the window resets. +- **Org-allowlist alignment.** The Kong target's `model_alias` is parameterized by `DECK_CHAT_MODEL`, so the same env var pins both the name VS Code sends and the name Kong matches against. If the org policy mandates `gpt-5-mini`, you set the env var to `gpt-5-mini`, and the VS Code `id` field uses the same value. Kong's `model.name` independently controls what is actually sent upstream. + + +{% mermaid %} +sequenceDiagram + participant VS as VS Code Copilot + participant K as Kong Gateway + participant L as LLM Provider + + VS->>K: POST /github-copilot/v1/chat/completions
Authorization: Bearer <apiKey>
body: {model, messages, temperature: 0.1} + activate K + K->>K: pre-function (Bearer to apikey, strip temperature) + K->>K: key-auth (validate apikey, attach Consumer) + K->>K: ai-proxy-advanced (alias match, inject provider key) + K->>K: ai-rate-limiting-advanced (per-Consumer token budget) + K->>L: Forwarded request (provider auth + normalized body) + activate L + L-->>K: Provider response + deactivate L + K-->>VS: OpenAI-format response (+ X-AI-RateLimit headers) + deactivate K +{% endmermaid %} + + +{% table %} +columns: + - title: Component + key: component + - title: Responsibility + key: responsibility +rows: + - component: "VS Code GitHub Copilot extension" + responsibility: "Posts Chat and agent requests to the Custom Endpoint URL using the OpenAI Chat Completions API shape." + - component: "Kong, [Pre-function](/plugins/pre-function/) Plugin" + responsibility: "Rewrites `Authorization: Bearer` to `apikey` so Key Auth can match it; strips `temperature` from the body so reasoning models accept it." + - component: "Kong, [Key Auth](/plugins/key-auth/) Plugin" + responsibility: "Validates the per-developer credential and attaches the request to the matching Consumer for downstream attribution and rate-limit counting." + - component: "Kong, [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin" + responsibility: "Matches the body's `model` against `model_alias`, injects the provider API key server-side, translates to the upstream's native format if needed, and forwards." + - component: "Kong, [AI Rate Limiting Advanced](/plugins/ai-rate-limiting-advanced/) Plugin" + responsibility: "Counts prompt + completion tokens against the Consumer's per-window budget and returns `429` on exhaustion." + - component: "LLM provider" + responsibility: "Model inference. Receives the Kong-injected credential and the normalized request body." +{% endtable %} + +## How it works + +When a developer asks Copilot Chat a question, the request flows through Kong before reaching the LLM provider. Here is the complete lifecycle: + +1. **Request from VS Code.** The Copilot extension posts `POST /github-copilot/v1/chat/completions` to Kong with the developer's Consumer credential in the `Authorization: Bearer ` header and an OpenAI-format chat body. The body's `model` field equals the value of `models[].id` in `chatLanguageModels.json`, which the recipe pins to `DECK_CHAT_MODEL`. + +1. **Header and body normalization.** The Pre-function Plugin runs first (priority approximately 1,000,000, above every other Plugin on the Route). It checks the `Authorization` header, strips the `Bearer ` prefix, and writes the bare credential into a new `apikey` header that the Key Auth Plugin will read. In the same pass it decodes the JSON body, removes the `temperature` field if present, and writes the body back. From this point on, all subsequent Plugins see a normalized request. + +1. **Per-developer authentication.** The Key Auth Plugin reads the `apikey` header, looks up the matching credential in the Control Plane, and attaches the request to the corresponding Consumer (`copilot-developer-alice` or `copilot-developer-bob` in this recipe). Unknown credentials are rejected with `401 Unauthorized`. + +1. **Alias matching and provider-key injection.** The AI Proxy Advanced Plugin reads the body's `model` field and matches it against the `model_alias` configured on each target. The recipe configures one target whose `model_alias` equals `DECK_CHAT_MODEL`, so a request that uses any other model name is rejected with `400 model not configured`. On a match, Kong injects the provider API key into the upstream request and, if the upstream uses a non-OpenAI format (Bedrock), translates the body from OpenAI to the upstream format. + +1. **Token rate limiting.** The AI Rate Limiting Advanced Plugin counts prompt and completion tokens against the Consumer's per-window budget. Rate-limit headers (`X-AI-RateLimit-Remaining-*`) are added to the response. Exhaustion returns `429 Too Many Requests` with a `Retry-After` header. + +1. **Response back to VS Code.** The provider response flows back through Kong to VS Code. Copilot renders the chat reply or runs the requested tool call. + +### Pre-function: request normalization + +The Pre-function Plugin runs Lua in the `access` phase before any other Plugin on the Route. The recipe uses it to fix two things Copilot's BYOK client cannot fix itself: it converts `Authorization: Bearer ` into the `apikey` header that Key Auth expects, and it strips the hard-coded `temperature: 0.1` field from the body so reasoning models accept the request. Both rewrites have to happen before Key Auth and AI Proxy Advanced run, which is why Pre-function (priority around 1,000,000) is the right tool: it has the highest access-phase priority of any standard Kong Plugin. + +#### Configuration details + +{%- raw %} +```yaml +plugins: + - name: pre-function + config: + access: + - | + local cjson = require "cjson.safe" + + local auth = kong.request.get_header("Authorization") + if auth and string.sub(string.lower(auth), 1, 7) == "bearer " then + ngx.req.set_header("apikey", string.sub(auth, 8)) + end + + local body = kong.request.get_raw_body() + if body then + local json = cjson.decode(body) + if json and json.temperature ~= nil then + json.temperature = nil + kong.service.request.set_raw_body(cjson.encode(json)) + end + end +``` +{% endraw -%} +{:.no-copy-code} + +**`access`**. An array of Lua source strings, each of which becomes a function executed during the `access` phase. Each function has full access to the [Kong PDK](/gateway/latest/pdk/) and the `ngx` namespace. + +**`ngx.req.set_header("apikey", ...)`**. The standard PDK function `kong.service.request.set_header` only mutates headers sent upstream; it does not change what later Plugins see. The Authorization-to-apikey rewrite has to be visible to the next Plugin in the chain (Key Auth), so the recipe drops down to `ngx.req.set_header`, which mutates the underlying nginx request and is visible to every PDK reader for the remainder of the request lifecycle. + +**Body rewrite via `kong.service.request.set_raw_body`**. Replaces the body that gets forwarded upstream. For OpenAI and Azure OpenAI (passthrough paths), this is the body the provider sees. For AWS Bedrock with `llm_format: openai`, the AI Proxy Advanced Plugin translates the request from OpenAI to Bedrock format on the way out and emits its own upstream body; verify that the temperature normalization survives that translation in your environment if you point this recipe at Bedrock with a reasoning-style upstream. + +{:.warning} +> **VS Code Copilot pins `temperature: 0.1` with no UI override.** Reasoning models (gpt-5 family, o-series) reject any temperature other than the default 1 with `400 ... 'temperature' does not support 0.1 ... Only the default (1) value is supported`. The Pre-function strip above is what makes those models usable as Copilot BYOK upstreams. If you remove this Plugin, requests against reasoning models fail with this exact error. + +### Key Auth: per-developer authentication + +The Key Auth Plugin authenticates each request against a Consumer credential. The recipe ships two Consumers (`copilot-developer-alice` and `copilot-developer-bob`) so the per-Consumer attribution in {{site.konnect_short_name}} Analytics is visible from the first apply. Each developer enters their assigned credential into the VS Code **Add Models... → Custom Endpoint → Chat Completions** wizard, which stores it in the OS keychain and writes a `${input:chat.lm.secret.}` placeholder into `chatLanguageModels.json`. The Pre-function Plugin above ensures Key Auth sees the rewritten value in the `apikey` header it expects. + +#### Configuration details + +```yaml +plugins: + - name: key-auth + config: + key_names: + - apikey + hide_credentials: true +``` +{:.no-copy-code} + +**`key_names: [apikey]`**. The header name Key Auth reads. Pre-function normalizes Copilot's `Authorization: Bearer` into this header, so a single name suffices and no fallback to query string or other headers is needed. + +**`hide_credentials: true`**. Strips the `apikey` header from the request before it is forwarded upstream. The provider never sees the Consumer's gateway credential. In {{site.base_gateway}} 3.14 and later, this defaults to `true`; the recipe sets it explicitly to document intent. + +**Consumer scaling.** Add one Consumer per developer (or per VS Code workstation) with its own `keyauth_credentials` entry. The recipe defines two for demonstration; production deployments typically generate Consumers programmatically as part of developer onboarding. + +### AI Proxy Advanced: provider routing and format translation + +The AI Proxy Advanced Plugin holds the provider API key, matches the request body's `model` field against the `model_alias` on a target, and forwards. With one target on the recipe, the `model_alias` acts as a hard gate: any model name VS Code sends other than `DECK_CHAT_MODEL` is rejected with `400 model not configured`. This is what makes the org-policy allowlist alignment work. Set `DECK_CHAT_MODEL` to the name your Copilot admin requires, and the VS Code `id` field has to match, or Kong refuses the request. + +#### Configuration details + +{%- raw %} +```yaml +plugins: + - name: ai-proxy-advanced + config: + llm_format: openai + max_request_body_size: 10485760 + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: ${{ env "DECK_OPENAI_TOKEN" }} + logging: + log_statistics: true + log_payloads: true + model: + model_alias: ${{ env "DECK_CHAT_MODEL" }} + provider: openai + name: ${{ env "DECK_CHAT_MODEL" }} + options: + input_cost: 0.25 + output_cost: 2.00 +``` +{% endraw -%} +{:.no-copy-code} + +**`llm_format: openai`**. Copilot speaks the OpenAI Chat Completions API. For OpenAI and Azure OpenAI upstreams the request passes through natively. For AWS Bedrock the Plugin translates the request from OpenAI to Bedrock's `InvokeModel` shape and the response back, so Copilot's client sees an OpenAI-shaped response in both cases. + +**`route_type: llm/v1/chat`**. Selects the chat-completions translation path. See the [AI Proxy Advanced reference](/plugins/ai-proxy-advanced/reference/) for the full list of supported Route types. + +**`model.model_alias`** equals **`model.name`** in this recipe. The alias is what VS Code sends in the body and what Kong matches; the name is what Kong sends upstream. For OpenAI and Azure OpenAI both are the same value (the user-visible model ID); for Bedrock, the alias can be a friendly bare name while the upstream `name` is the long Bedrock model ID. Change the env var pair if you want the user-facing alias to differ from the upstream name. + +**`auth`**. The credential block injected on every upstream request. The shape varies by provider; the Apply section's per-provider navtabs show the OpenAI, Azure OpenAI, and Bedrock variants. The provider key is never sent to VS Code and never sits on the developer's machine. + +**`max_request_body_size: 10485760`**. Sets the maximum allowed body to 10 MB. Copilot Chat conversations accumulate large context (chat history, open files, workspace symbols), and the default body limit can reject these requests. + +**`response_streaming: allow`**. Lets the Plugin pass Server-Sent Events streaming responses from the provider back to VS Code. Copilot Chat renders streamed token output progressively. + +**`logging.log_statistics`** and **`logging.log_payloads`**. Emit token-usage data and request/response bodies to any attached logging Plugin (for example, [HTTP Log](/plugins/http-log/) or [File Log](/plugins/file-log/)) for per-developer audit and cost attribution. {{site.konnect_short_name}} Analytics captures token usage independently of these fields. + +**`model.options.input_cost`** and **`model.options.output_cost`**. USD per 1,000,000 tokens for prompt and completion respectively. Kong multiplies the reported `prompt_tokens` and `completion_tokens` by these rates per request and emits the result as the `cost` metric to {{site.konnect_short_name}} Analytics, which is what populates the **Total cost** tile on the dashboard. The recipe ships with `0.25` / `2.00`. Reasonable for a small reasoning-class model like `gpt-5-mini`; replace with the published rates for whatever provider/model your `DECK_CHAT_MODEL` resolves to (for example, `gpt-4o` is roughly `2.50` / `10.00`). Without these fields Konnect would have to look up the rate from its own internal price list, which lags newly released models. Set them explicitly to avoid empty cost tiles. + +{:.warning} +> **The GitHub org-policy allowlist runs in front of Kong.** Copilot Business and Enterprise let admins pin an allowlist of models BYOK requests are permitted to claim. When the policy is set and your request names a different model, VS Code returns `400 "cannot use own model - must be: gpt-5-mini"` before the request ever reaches Kong. To resolve, set `DECK_CHAT_MODEL` (and the matching VS Code `models[].id`) to the model the policy mandates; if you want Kong to actually forward to a different upstream model, change `model.name` (the upstream name) independently of `model_alias` (the VS Code-facing name). + +### AI Rate Limiting Advanced: per-Consumer token budgets + +The AI Rate Limiting Advanced Plugin counts prompt and completion tokens against a per-Consumer budget. Counting tokens, not requests, is the correct unit for LLM cost control. A single Copilot agent run can be 30K+ tokens once workspace context, tool definitions, and conversation history are included. When a Consumer exhausts its budget, Kong returns `429 Too Many Requests` until the window resets. + +#### Configuration details + +```yaml +plugins: + - name: ai-rate-limiting-advanced + config: + policies: + - limits: + - limit: 50000 + window_size: 60 + window_type: sliding + identifier: consumer + tokens_count_strategy: total_tokens + strategy: local + llm_format: openai +``` +{:.no-copy-code} + +**`policies`** is an array of rate-limiting policies. Each policy has a `limits` array of limit + window pairs and an optional `match` block for targeting specific providers or models. With no `match` block the policy applies to every request. The recipe configures 50,000 total tokens per 60-second sliding window per Consumer. + +**`identifier: consumer`** scopes the counter to the Kong Consumer attached by Key Auth. Each developer gets their own bucket. Without Key Auth attaching a Consumer, this identifier would degrade to a single shared bucket; the Pre-function header rewrite is what makes per-Consumer counting work for Copilot's Bearer-style auth. + +**`tokens_count_strategy: total_tokens`** counts both prompt (input) and completion (output) tokens. Alternatives are `prompt_tokens`, `completion_tokens`, or `cost`. + +**`window_type: sliding`** uses a sliding-window algorithm. `fixed` is the alternative and resets all counters at the window boundary. + +**`strategy: local`** keeps counters in memory on each Kong node. Fine for single-node and development. For multi-node production, switch to `strategy: redis` with a shared Redis instance so counters stay consistent across nodes. + +**`llm_format: openai`** must match the AI Proxy Advanced Plugin's `llm_format` so the rate limiter parses token counts from the response correctly. + +Kong returns token rate-limit headers with every response: + +{% table %} +columns: + - title: Header + key: header + - title: Description + key: description +rows: + - header: "`X-AI-RateLimit-Limit-{window}-{provider}`" + description: "Maximum tokens allowed in the window." + - header: "`X-AI-RateLimit-Remaining-{window}-{provider}`" + description: "Tokens remaining in the current window." + - header: "`RateLimit-Reset`" + description: "Seconds until the window resets." +{% endtable %} + +When the token limit is exceeded, Kong returns `429 Too Many Requests` with a `Retry-After` header. + +{:.info} +> In production, store provider credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. Per-developer Consumer credentials are also good vault candidates as the team scales beyond a handful of seats. + + +## Apply the Kong configuration + +This section configures the Control Plane in two parts. First, adopt the quickstart Control Plane into a kongctl namespace so the apply commands below can manage it. The recipe's `select_tags` and the `github-copilot-byok-recipe` namespace scope every resource so teardown removes only this recipe's configuration. + +```bash +kongctl adopt control-plane "${KONNECT_CONTROL_PLANE_NAME}" \ + --namespace "${KONNECT_CONTROL_PLANE_NAME}" \ + --pat "${KONNECT_TOKEN}" +``` + +Adoption stamps the `KONGCTL-namespace` label on the Control Plane. + +The provider tabs below create a Service and Route at `/github-copilot`, a Pre-function Plugin for header and body normalization, a Key Auth Plugin scoped to the Service, an AI Proxy Advanced Plugin for provider-key injection and OpenAI-format passthrough or translation, an AI Rate Limiting Advanced Plugin for per-Consumer token budgets, and two demonstration Consumers with key-auth credentials. See the [kongctl documentation](/kongctl/) for more on federated configuration management. + +Select your provider below, export the required environment variables, and apply. + +{% navtabs "Providers" %} +{% navtab "OpenAI" %} + +Export your environment variables. Set `DECK_CHAT_MODEL` to whatever model name the Copilot org policy permits. If the policy is unrestricted, pick any OpenAI chat model your account has access to: + +```bash +export DECK_CHAT_MODEL='gpt-4o' # must match VS Code's models[].id and the org Copilot allowlist +``` + +{:.warning} +> `DECK_CHAT_MODEL` is the contract between Copilot's policy, VS Code's `models[].id`, and Kong's `model_alias`. If your Copilot administrator has pinned a specific model (for example, `gpt-5-mini`), set this env var to that exact value and use the same string in `chatLanguageModels.json`. A mismatch fails as either `400 "cannot use own model - must be: ..."` (rejected by GitHub policy before Kong) or `400 model not configured` (rejected by Kong). + +`KONNECT_CONTROL_PLANE_NAME`, `DECK_OPENAI_TOKEN`, `DECK_COPILOT_KEY_ALICE`, and `DECK_COPILOT_KEY_BOB` are already exported during the Prerequisites, so they do not need to be re-exported per tab. + +Apply the Kong configuration: + +```bash +{%- raw %} +cat <<'EOF' > kong-recipe.yaml +_format_version: '3.0' +_info: + select_tags: + - github-copilot-byok-recipe +services: +- name: github-copilot-byok + url: http://localhost + routes: + - name: github-copilot-byok + paths: + - /github-copilot + protocols: + - http + - https + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: pre-function + instance_name: github-copilot-byok-normalize + config: + access: + - | + local cjson = require "cjson.safe" + + local auth = kong.request.get_header("Authorization") + if auth and string.sub(string.lower(auth), 1, 7) == "bearer " then + ngx.req.set_header("apikey", string.sub(auth, 8)) + end + + local body = kong.request.get_raw_body() + if body then + local json = cjson.decode(body) + if json and json.temperature ~= nil then + json.temperature = nil + kong.service.request.set_raw_body(cjson.encode(json)) + end + end + - name: key-auth + instance_name: github-copilot-byok-auth + config: + key_names: + - apikey + hide_credentials: true + - name: ai-proxy-advanced + instance_name: github-copilot-byok-proxy + config: + llm_format: openai + max_request_body_size: 10485760 + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: Authorization + header_value: ${{ env "DECK_OPENAI_TOKEN" }} + logging: + log_statistics: true + log_payloads: true + model: + model_alias: ${{ env "DECK_CHAT_MODEL" }} + provider: openai + name: ${{ env "DECK_CHAT_MODEL" }} + options: + input_cost: 0.25 + output_cost: 2.00 + - name: ai-rate-limiting-advanced + instance_name: github-copilot-byok-ratelimit + config: + policies: + - limits: + - limit: 50000 + window_size: 60 + window_type: sliding + identifier: consumer + tokens_count_strategy: total_tokens + strategy: local + llm_format: openai +consumers: +- username: copilot-developer-alice + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_ALICE" }} +- username: copilot-developer-bob + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_BOB" }} +EOF +{% endraw -%} + +echo " +_defaults: + kongctl: + namespace: github-copilot-byok-recipe +control_planes: + - ref: recipe-cp + name: \"${KONNECT_CONTROL_PLANE_NAME}\" + _deck: + files: + - kong-recipe.yaml +" | kongctl apply -f - -o text --auto-approve --pat "${KONNECT_TOKEN}" + +rm -f kong-recipe.yaml + +``` +{: data-test-step="block" .collapsible } + +{% endnavtab %} +{% navtab "Azure OpenAI" %} + +Export your environment variables. For Azure OpenAI, `DECK_CHAT_MODEL` is sent as the body's `model` field, which Azure largely ignores in favor of the deployment in the URL path. It still has to match the VS Code `models[].id` and the Copilot org allowlist: + +```bash +export DECK_CHAT_MODEL='gpt-4o' # must match VS Code's models[].id and the org Copilot allowlist +``` + +{:.warning} +> `DECK_CHAT_MODEL` is the contract between Copilot's policy, VS Code's `models[].id`, and Kong's `model_alias`. If your Copilot administrator has pinned a specific model (for example, `gpt-5-mini`), set this env var to that exact value and use the same string in `chatLanguageModels.json`. A mismatch fails as either `400 "cannot use own model - must be: ..."` (rejected by GitHub policy before Kong) or `400 model not configured` (rejected by Kong). + +`KONNECT_CONTROL_PLANE_NAME`, the `DECK_AZURE_*` credentials, `DECK_COPILOT_KEY_ALICE`, and `DECK_COPILOT_KEY_BOB` are already exported during the Prerequisites, so they do not need to be re-exported per tab. + +Apply the Kong configuration: + +```bash +{%- raw %} +cat <<'EOF' > kong-recipe.yaml +_format_version: '3.0' +_info: + select_tags: + - github-copilot-byok-recipe +services: +- name: github-copilot-byok + url: http://localhost + routes: + - name: github-copilot-byok + paths: + - /github-copilot + protocols: + - http + - https + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: pre-function + instance_name: github-copilot-byok-normalize + config: + access: + - | + local cjson = require "cjson.safe" + + local auth = kong.request.get_header("Authorization") + if auth and string.sub(string.lower(auth), 1, 7) == "bearer " then + ngx.req.set_header("apikey", string.sub(auth, 8)) + end + + local body = kong.request.get_raw_body() + if body then + local json = cjson.decode(body) + if json and json.temperature ~= nil then + json.temperature = nil + kong.service.request.set_raw_body(cjson.encode(json)) + end + end + - name: key-auth + instance_name: github-copilot-byok-auth + config: + key_names: + - apikey + hide_credentials: true + - name: ai-proxy-advanced + instance_name: github-copilot-byok-proxy + config: + llm_format: openai + max_request_body_size: 10485760 + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + header_name: api-key + header_value: ${{ env "DECK_AZURE_API_KEY" }} + logging: + log_statistics: true + log_payloads: true + model: + model_alias: ${{ env "DECK_CHAT_MODEL" }} + provider: azure + name: ${{ env "DECK_CHAT_MODEL" }} + options: + azure_api_version: ${{ env "DECK_AZURE_API_VERSION" }} + azure_deployment_id: ${{ env "DECK_AZURE_DEPLOYMENT_ID" }} + azure_instance: ${{ env "DECK_AZURE_INSTANCE" }} + input_cost: 0.25 + output_cost: 2.00 + - name: ai-rate-limiting-advanced + instance_name: github-copilot-byok-ratelimit + config: + policies: + - limits: + - limit: 50000 + window_size: 60 + window_type: sliding + identifier: consumer + tokens_count_strategy: total_tokens + strategy: local + llm_format: openai +consumers: +- username: copilot-developer-alice + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_ALICE" }} +- username: copilot-developer-bob + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_BOB" }} +EOF +{% endraw -%} + +echo " +_defaults: + kongctl: + namespace: github-copilot-byok-recipe +control_planes: + - ref: recipe-cp + name: \"${KONNECT_CONTROL_PLANE_NAME}\" + _deck: + files: + - kong-recipe.yaml +" | kongctl apply -f - -o text --auto-approve --pat "${KONNECT_TOKEN}" + +rm -f kong-recipe.yaml + +``` +{: data-test-step="block" .collapsible } + +{% endnavtab %} +{% navtab "AWS Bedrock" %} + +Export your environment variables. Bedrock model IDs include a `provider.model-version-vN:M` shape, and the value here is what Kong sends upstream **and** what VS Code's `models[].id` has to match: + +```bash +export DECK_CHAT_MODEL='anthropic.claude-sonnet-4-5-20250929-v1:0' # must match VS Code's models[].id +``` + +{:.warning} +> `DECK_CHAT_MODEL` is the contract between Copilot's policy, VS Code's `models[].id`, and Kong's `model_alias`. If your Copilot administrator has pinned a specific model (for example, `gpt-5-mini`), set this env var to that exact value and use the same string in `chatLanguageModels.json`. For Bedrock, you can split the user-facing alias from the upstream model name by editing the deck config so that `model_alias` references one env var (matching VS Code and the policy) and `model.name` references a different one (the real Bedrock model ID). A mismatch fails as either `400 "cannot use own model - must be: ..."` (rejected by GitHub policy before Kong) or `400 model not configured` (rejected by Kong). + +`KONNECT_CONTROL_PLANE_NAME`, the `DECK_AWS_*` credentials, `DECK_COPILOT_KEY_ALICE`, and `DECK_COPILOT_KEY_BOB` are already exported during the Prerequisites, so they do not need to be re-exported per tab. + +Apply the Kong configuration: + +```bash +{%- raw %} +cat <<'EOF' > kong-recipe.yaml +_format_version: '3.0' +_info: + select_tags: + - github-copilot-byok-recipe +services: +- name: github-copilot-byok + url: http://localhost + routes: + - name: github-copilot-byok + paths: + - /github-copilot + protocols: + - http + - https + methods: + - POST + - OPTIONS + strip_path: true + plugins: + - name: pre-function + instance_name: github-copilot-byok-normalize + config: + access: + - | + local cjson = require "cjson.safe" + + local auth = kong.request.get_header("Authorization") + if auth and string.sub(string.lower(auth), 1, 7) == "bearer " then + ngx.req.set_header("apikey", string.sub(auth, 8)) + end + + local body = kong.request.get_raw_body() + if body then + local json = cjson.decode(body) + if json and json.temperature ~= nil then + json.temperature = nil + kong.service.request.set_raw_body(cjson.encode(json)) + end + end + - name: key-auth + instance_name: github-copilot-byok-auth + config: + key_names: + - apikey + hide_credentials: true + - name: ai-proxy-advanced + instance_name: github-copilot-byok-proxy + config: + llm_format: openai + max_request_body_size: 10485760 + response_streaming: allow + targets: + - route_type: llm/v1/chat + auth: + aws_access_key_id: ${{ env "DECK_AWS_ACCESS_KEY_ID" }} + aws_secret_access_key: ${{ env "DECK_AWS_SECRET_ACCESS_KEY" }} + logging: + log_statistics: true + log_payloads: true + model: + model_alias: ${{ env "DECK_CHAT_MODEL" }} + provider: bedrock + name: ${{ env "DECK_CHAT_MODEL" }} + options: + bedrock: + aws_region: ${{ env "DECK_AWS_REGION" }} + input_cost: 0.25 + output_cost: 2.00 + - name: ai-rate-limiting-advanced + instance_name: github-copilot-byok-ratelimit + config: + policies: + - limits: + - limit: 50000 + window_size: 60 + window_type: sliding + identifier: consumer + tokens_count_strategy: total_tokens + strategy: local + llm_format: openai +consumers: +- username: copilot-developer-alice + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_ALICE" }} +- username: copilot-developer-bob + keyauth_credentials: + - key: ${{ env "DECK_COPILOT_KEY_BOB" }} +EOF +{% endraw -%} + +echo " +_defaults: + kongctl: + namespace: github-copilot-byok-recipe +control_planes: + - ref: recipe-cp + name: \"${KONNECT_CONTROL_PLANE_NAME}\" + _deck: + files: + - kong-recipe.yaml +" | kongctl apply -f - -o text --auto-approve --pat "${KONNECT_TOKEN}" + +rm -f kong-recipe.yaml + +``` +{: data-test-step="block" .collapsible } + +{% endnavtab %} +{% endnavtabs %} + + +### Create the Copilot Usage dashboard + +Create a custom dashboard at the org level, pre-filtered to this recipe's Gateway Service. The dashboard surfaces cost, token usage, request volume, model mix, per-developer (Consumer) usage, and latency for traffic through the `github-copilot-byok` Service. The dashboard JSON is in the code block below; if a labelled dashboard from a prior apply already exists, the block reuses it instead of creating a duplicate. + +```bash +# Look up the Control Plane and Service IDs so the dashboard's gateway_service +# preset filter resolves to the scoped UUID Konnect expects. +CP_ID=$(kongctl get gateway control-plane "${KONNECT_CONTROL_PLANE_NAME}" \ + --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r) +SERVICE_ID=$(kongctl api get "/v2/control-planes/${CP_ID}/core-entities/services" \ + --pat "${KONNECT_TOKEN}" -o json \ + --jq '.data[] | select(.name=="github-copilot-byok") | .id' -r) + +if [ -z "${CP_ID}" ] || [ -z "${SERVICE_ID}" ]; then + echo "Refusing to create dashboard: CP_ID='${CP_ID}' SERVICE_ID='${SERVICE_ID}'." + echo "Confirm the apply step succeeded and the Service is visible in Konnect, then retry." + exit 1 +fi + +EXISTING_DASHBOARDS=$(kongctl api get "/v2/dashboards?filter%5Blabels.recipe%5D=github-copilot-byok-recipe" \ + --pat "${KONNECT_TOKEN}" -o json --jq '.data | length') + +if [ "${EXISTING_DASHBOARDS}" -gt 0 ]; then + echo "Copilot Usage dashboard already exists. Reusing." +else + cat <<'EOF' | jq --arg ref "${CP_ID}:${SERVICE_ID}" '.definition.preset_filters[0].value = [$ref]' > copilot-usage-dashboard.json +{ + "name": "Copilot Usage", + "definition": { + "tiles": [ + { + "id": "c0f1ee01-0000-4000-8000-000000000001", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 1 }, "position": { "col": 0, "row": 0 } }, + "definition": { + "chart": { "type": "single_value", "chart_title": "Total cost ($)" }, + "query": { + "filters": [ + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["cost"], + "datasource": "llm_usage", + "dimensions": [] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000002", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 1 }, "position": { "col": 2, "row": 0 } }, + "definition": { + "chart": { "type": "single_value", "chart_title": "Total tokens" }, + "query": { + "filters": [ + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["total_tokens"], + "datasource": "llm_usage", + "dimensions": [] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000003", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 1 }, "position": { "col": 4, "row": 0 } }, + "definition": { + "chart": { "type": "single_value", "chart_title": "Total Copilot requests" }, + "query": { + "filters": [ + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["ai_request_count"], + "datasource": "llm_usage", + "dimensions": [] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000004", + "type": "chart", + "layout": { "size": { "cols": 3, "rows": 2 }, "position": { "col": 0, "row": 1 } }, + "definition": { + "chart": { "type": "top_n", "chart_title": "Top Copilot models by usage" }, + "query": { + "limit": 10, + "filters": [], + "metrics": ["total_tokens", "ai_request_count"], + "datasource": "llm_usage", + "dimensions": ["ai_request_model"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000005", + "type": "chart", + "layout": { "size": { "cols": 3, "rows": 2 }, "position": { "col": 3, "row": 1 } }, + "definition": { + "chart": { "type": "timeseries_line", "stacked": false, "chart_title": "Model usage trend (top 5)" }, + "query": { + "limit": 5, + "filters": [ + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["total_tokens"], + "datasource": "llm_usage", + "dimensions": ["ai_request_model", "time"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000006", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 2 }, "position": { "col": 0, "row": 3 } }, + "definition": { + "chart": { "type": "donut", "chart_title": "Copilot health check" }, + "query": { + "filters": [ + { "field": "gateway_service", "operator": "not_empty" }, + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["ai_request_count"], + "datasource": "llm_usage", + "dimensions": ["status_code_grouped"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000007", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 2 }, "position": { "col": 2, "row": 3 } }, + "definition": { + "chart": { "type": "donut", "chart_title": "Copilot provider usage" }, + "query": { + "filters": [ + { "field": "gateway_service", "operator": "not_empty" }, + { "field": "ai_provider", "operator": "not_empty" }, + { "field": "ai_provider", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["ai_request_count"], + "datasource": "llm_usage", + "dimensions": ["ai_provider"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000008", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 2 }, "position": { "col": 4, "row": 3 } }, + "definition": { + "chart": { "type": "timeseries_bar", "stacked": true, "chart_title": "LLM latency (avg)" }, + "query": { + "filters": [ + { "field": "ai_request_model", "operator": "not_empty" }, + { "field": "ai_request_model", "value": ["UNSPECIFIED"], "operator": "not_in" } + ], + "metrics": ["llm_latency_average"], + "datasource": "llm_usage", + "dimensions": ["ai_request_model", "time"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-000000000009", + "type": "chart", + "layout": { "size": { "cols": 3, "rows": 2 }, "position": { "col": 0, "row": 9 } }, + "definition": { + "chart": { "type": "horizontal_bar", "stacked": true, "chart_title": "Copilot usage by developer (requests)" }, + "query": { + "filters": [ + { "field": "consumer", "operator": "not_empty" } + ], + "metrics": ["ai_request_count"], + "datasource": "llm_usage", + "dimensions": ["consumer"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-00000000000a", + "type": "chart", + "layout": { "size": { "cols": 3, "rows": 2 }, "position": { "col": 3, "row": 9 } }, + "definition": { + "chart": { "type": "vertical_bar", "stacked": true, "chart_title": "Copilot usage by developer (tokens)" }, + "query": { + "filters": [ + { "field": "consumer", "operator": "not_empty" } + ], + "metrics": ["total_tokens"], + "datasource": "llm_usage", + "dimensions": ["consumer"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-00000000000b", + "type": "chart", + "layout": { "size": { "cols": 2, "rows": 2 }, "position": { "col": 0, "row": 11 } }, + "definition": { + "chart": { "type": "vertical_bar", "stacked": true, "chart_title": "AI security report (401 / 403 / 429)" }, + "query": { + "filters": [ + { "field": "status_code", "value": ["401", "403", "429"], "operator": "in" } + ], + "metrics": ["request_count"], + "datasource": "api_usage", + "dimensions": ["status_code", "consumer"] + } + } + }, + { + "id": "c0f1ee01-0000-4000-8000-00000000000c", + "type": "chart", + "layout": { "size": { "cols": 3, "rows": 2 }, "position": { "col": 2, "row": 11 } }, + "definition": { + "chart": { "type": "timeseries_bar", "stacked": true, "chart_title": "Monthly spend trends" }, + "query": { + "limit": 10, + "filters": [], + "metrics": ["cost"], + "datasource": "llm_usage", + "dimensions": ["ai_request_model", "time"], + "time_range": { "type": "relative", "time_range": "30d" }, + "granularity": "daily" + } + } + } + ], + "template_id": "AI_GATEWAY", + "preset_filters": [ + { "field": "gateway_service", "value": [], "operator": "in" } + ] + }, + "labels": { + "recipe": "github-copilot-byok-recipe" + } +} +EOF + DASHBOARD_ID=$(kongctl api post /v2/dashboards \ + -f copilot-usage-dashboard.json \ + --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r) + rm -f copilot-usage-dashboard.json + echo "Created Copilot Usage dashboard (id: ${DASHBOARD_ID}). Open it in Konnect at Observability → Custom dashboards → 'Copilot Usage'." +fi +``` +{: data-test-step="block" .collapsible } + + +## Try it out + +With the configuration applied, configure VS Code Copilot to post Chat and agent requests to Kong, then ask a question and watch the request, attribution, and rate-limit headers flow through {{site.konnect_short_name}} Analytics. + +{:.warning} +> **Do not paste a raw API key into `chatLanguageModels.json`.** The `apiKey` field only accepts a `${input:chat.lm.secret.}` placeholder, which VS Code mints when you enter a key into the **Add Models... → Custom Endpoint → Chat Completions** wizard. A raw value in that field is silently dropped: VS Code sends `Authorization: Bearer` with no key, and Kong returns `401 No API key found in request`. + +### Add the Custom Endpoint to VS Code + +VS Code stores Custom Endpoint API keys in its encrypted OS-keychain secret storage. `chatLanguageModels.json` only ever sees a `${input:chat.lm.secret.}` placeholder that resolves to the stored value at request time. The only supported way to mint that placeholder is the **Add Models...** wizard, which prompts for the key and writes both the secret entry and the placeholder for you. + +In VS Code: + +1. Open the **Chat** view (`Ctrl+Alt+I` / `Cmd+Ctrl+I`). +1. Click the **model picker** at the bottom of the Chat input. +1. Select **Manage Language Models...**. +1. Click **Add Models...** → **Custom Endpoint** → **Chat Completions**. +1. When VS Code prompts for an **API key**, paste the value of `$DECK_COPILOT_KEY_ALICE`. Copy it with `printf '%s' "$DECK_COPILOT_KEY_ALICE"` (not `echo`) so a trailing zsh `%` marker is never included. That character is a prompt artifact, not part of the key, and pasting it produces a `401` from Key Auth. + +VS Code stores the literal key in the OS keychain and opens `chatLanguageModels.json` with a generated stub entry whose `apiKey` field is already filled in with a `${input:chat.lm.secret.}` placeholder. Edit the `name`, `models[].id`, `models[].name`, and `models[].url` fields to match this recipe. **Leave the `apiKey` line exactly as the wizard wrote it**: + +```json +[ + { + "name": "Kong AI Gateway", + "vendor": "customendpoint", + "apiKey": "${input:chat.lm.secret.157fd74e}", + "apiType": "chat-completions", + "models": [ + { + "id": "gpt-4o", + "name": "GPT-4o (via Kong)", + "url": "http://localhost:8000/github-copilot/v1/chat/completions", + "toolCalling": true, + "vision": true, + "maxInputTokens": 128000, + "maxOutputTokens": 16000 + } + ] + } +] +``` +{:.no-copy-code} + +The hex suffix after `chat.lm.secret.` is per-entry. Yours will differ from the example. Four behaviors are load-bearing: + +- **`apiKey`** is a `${input:chat.lm.secret.}` reference, not a literal. Re-running the **Add Models...** wizard mints a new secret and a new placeholder; replacing the wizard-generated value with a raw key disables auth entirely. To rotate, open **Manage Language Models...** → select the entry → **Edit API Key** and VS Code updates the stored secret behind the same placeholder. +- **`models[].id`** must equal `DECK_CHAT_MODEL` exactly. VS Code sends this string as the body's `model` field, and Kong's `model_alias` rejects anything else with `400 model not configured`. If your Copilot administrator has pinned an allowlist (for example, `gpt-5-mini`), set this value AND `DECK_CHAT_MODEL` to that exact name. +- **`models[].url`** is the **full** chat-completions path. VS Code posts to the URL verbatim; it does not append `/chat/completions` for you. +- **Window reload after edits.** After editing `chatLanguageModels.json`, run **Developer: Reload Window** so VS Code re-parses the file. The Chat model picker shows the new entry. Select **GPT-4o (via Kong)**. + +### Ask Copilot a question + +In the Chat view, with **GPT-4o (via Kong)** selected, ask any question. For example: + +```text +> Explain what a Kong Consumer is in one sentence. +``` +{:.no-copy-code} + +Copilot Chat sends the request through Kong, which authenticates the developer, normalizes the body, injects the provider key, and forwards. The response streams back into the Chat view. Subsequent requests reuse the same credential silently. + +### Confirm Bearer-to-apikey rewrite and temperature strip + +The Pre-function Plugin is doing two invisible-to-the-developer rewrites on every request. To verify them, you can inspect the request payload in {{site.konnect_short_name}} Analytics or with `log_payloads: true` going to a logging Plugin sink. Two signals confirm correctness: + +- Key Auth attributes the request to the matching Consumer (`copilot-developer-alice` or `copilot-developer-bob`), visible in the **Top developers** tiles on the **Copilot Usage** dashboard. +- The upstream payload (visible in `log_payloads: true` output) does not contain a `temperature` field. If you point this Kong Service at a reasoning model (gpt-5, o-series) and Copilot Chat returns answers instead of `400`, the strip is working. + +If you want to see the rewrites live during local development, temporarily insert two `kong.log.notice` lines at the top of the Pre-function access block to dump the inbound headers and body, then watch the data plane logs: + +{% raw %} +```bash +docker logs -f $(docker ps --filter ancestor=kong/kong-gateway --format '{{.Names}}') 2>&1 \ + | grep -iE 'authorization|apikey' +``` +{% endraw %} + +Remove the diagnostic lines once you have confirmed `"authorization":"Bearer "` is arriving. Leaving them in writes the full prompt, tool definitions, and file contents from every Copilot request into your gateway logs. + +### Hit the model-alias gate + +To see the alias guardrail in action, edit `chatLanguageModels.json` and change `models[].id` to a different model name, for example `gpt-3.5-turbo`. Save. Re-select the model in the picker and ask Copilot another question. Kong rejects the request: + +```text +Request failed: 400 model not configured +``` +{:.no-copy-code} + +Change `id` back to the value of `DECK_CHAT_MODEL` and the request succeeds again. This is the Kong-side enforcement boundary: developers cannot use a model the platform team has not configured a target for, even if VS Code's UI offers it. + +### Hit the per-Consumer rate limit + +The recipe configures 50,000 total tokens per 60-second sliding window per Consumer. Copilot agent runs with workspace context routinely consume 30,000+ tokens per turn. A single multi-tool reply can come close to exhausting the window on its own. Send two or three prompts in quick succession that each include meaningful file context (for example, ask Copilot to summarize a file in your repo) and Kong returns: + +```text +Request failed: 429 Too Many Requests +``` +{:.no-copy-code} + +The response includes a `Retry-After` header indicating how many seconds remain in the window. Because `identifier: consumer` scopes the bucket to the developer, the other Consumer (`copilot-developer-bob`) still has its full budget at the same instant — rotate VS Code's stored credential via **Manage Language Models... → Edit API Key** to the value of `$DECK_COPILOT_KEY_BOB` to verify. + +### Explore in Konnect + +Open [{{site.konnect_product_name}}](https://cloud.konghq.com/) to see the recipe's resources in place. + +**Copilot Usage dashboard** + +Navigate to **Observability → Custom dashboards → `Copilot Usage`**. The dashboard is pre-filtered to the `github-copilot-byok` Gateway Service and surfaces: + +- Total cost, total tokens, and request count for the recipe's traffic. +- Top Copilot models by token and request volume, plus a model-usage trend line. +- Health-check, provider-share, and average-latency breakdowns. +- Per-developer (Consumer) usage broken out by request count and token volume — the per-developer ceilings the rate limiter enforces are directly visible here. +- An AI security report scoped to 4XX responses on the recipe's Route. + +LLM analytics data takes 2–5 minutes to surface after the first successful request. If the dashboard remains empty beyond that, see [Troubleshooting](#troubleshooting) below. + +**Gateway resources** + +Navigate to **API Gateway → Gateways → `github-copilot-byok-recipe`**. The Control Plane the quickstart provisioned and `kongctl adopt` attached to this namespace surfaces: + +- **Gateway services → `github-copilot-byok`**. The Service the apply block registered. Its detail page has tabs for Configuration, Routes, Plugins, and Analytics. + - **Routes** tab: the `/github-copilot` Route. + - **Plugins** tab: Pre-function, Key Auth, AI Proxy Advanced, and AI Rate Limiting Advanced, all scoped to the Service. +- **Consumers**: `copilot-developer-alice` and `copilot-developer-bob`, each with a Key Auth credential. + +The Gateway Service's **Analytics** tab and the **Observability** L1 menu remain available for deeper exploration beyond the curated dashboard above. + +## Troubleshooting + +### `401 No API key found in request` from VS Code + +Kong's Key Auth Plugin returns this when the `apikey` header is missing. Which, for Copilot, means the `Authorization` header was either absent or arrived as a bare `Bearer` with no value. Verify in this order: + +1. **Confirm the request reached Kong with a non-empty Bearer value.** Add the diagnostic `kong.log.notice` lines described in the [Confirm Bearer-to-apikey rewrite](#confirm-bearer-to-apikey-rewrite-and-temperature-strip) section and look for `"authorization":"Bearer "`. If the value is literally `Bearer` with nothing after, VS Code did not resolve the secret placeholder. +1. **Re-mint the placeholder.** Open **Manage Language Models...**, remove the Kong AI Gateway entry, run **Add Models... → Custom Endpoint → Chat Completions** again, and paste the key when prompted. A `chatLanguageModels.json` entry whose `apiKey` was hand-typed (rather than wizard-generated) is silently invalid even if its shape looks identical to a working one. +1. **Strip stray characters from the key.** Copy the value with `printf '%s' "$DECK_COPILOT_KEY_ALICE"`. zsh's trailing `%` marker for newline-less output is a common silent-401 source. + +### Dashboard is empty after a few minutes + +If **Observability → Custom dashboards → `Copilot Usage`** shows no data 5+ minutes after a successful request, work outward from the data plane: + +1. **Confirm requests reached the data plane.** Tail `docker logs` on the Kong container and trigger a Copilot prompt; you should see access-log lines for `POST /github-copilot/v1/chat/completions`. If the access log is silent, the request never left VS Code or never hit Kong — fix the client side first. + +1. **Confirm the data plane is reporting to Konnect telemetry.** A data plane that was relaunched (for example, recreated with new env vars) sometimes fails to reconnect to the telemetry endpoint: + + {% raw %} + ```bash + docker logs $(docker ps --filter ancestor=kong/kong-gateway --format '{{.Names}}') 2>&1 \ + | grep -iE 'telemetry|cluster_telemetry|connected to.*tp\.konghq' + ``` + {% endraw %} + + A healthy node shows a successful connection to `.tp.konghq.com:443`. Repeated TLS or DNS errors mean telemetry is silently dropping; restart the container and re-check. + +1. **Confirm raw requests are landing in Konnect.** Open **Observability → Analytics → Requests**, scope to the `github-copilot-byok-recipe` Control Plane, and verify recent 2xx entries appear. If this view is populated but the custom dashboard is not, the dashboard's filter or tiles are the issue. If this view is also empty, telemetry is the issue (step 2). + +1. **Confirm the dashboard filter resolved to a real Service ID.** The dashboard's `preset_filters[0].value` is set by jq to `${CP_ID}:${SERVICE_ID}` during creation. Re-run the lookups and verify both IDs return non-empty values: + + ```bash + CP_ID=$(kongctl get gateway control-plane "${KONNECT_CONTROL_PLANE_NAME}" \ + --pat "${KONNECT_TOKEN}" -o json --jq '.id' -r) + echo "CP_ID=${CP_ID}" + kongctl api get "/v2/control-planes/${CP_ID}/core-entities/services" \ + --pat "${KONNECT_TOKEN}" -o json \ + --jq '.data[] | select(.name=="github-copilot-byok") | .id' -r + ``` + + If either is empty, delete the dashboard via the cleanup block below and re-run the dashboard creation step. + +1. **Send a few more requests if tiles still look empty.** Several tiles filter out `ai_provider = UNSPECIFIED`. Brand-new gateways occasionally tag the first request or two as `UNSPECIFIED` while the AI Proxy Advanced Plugin reports provider metadata back to Konnect; a handful of additional Copilot prompts brings the tiles to life. + +## Variations and next steps + +- **Add SSO with Consumer Group tiers.** Replace Key Auth with the [OpenID Connect](/plugins/openid-connect/) Plugin pointed at your IdP, define Consumer Groups for `copilot-standard-users` and `copilot-power-users`, scope separate AI Proxy Advanced Plugins to each group, and let the user's group claim drive which model they can access. The [Claude Code SSO](/cookbooks/claude-code-sso/) recipe demonstrates this pattern end-to-end. +- **Add code-secret PII redaction.** Add the [AI PII Sanitizer](/plugins/ai-pii-sanitizer/) Plugin before AI Proxy Advanced to redact API keys, tokens, and other secrets from prompts before they reach the LLM provider. Developers pasting `.env` snippets into Copilot Chat is a real exfiltration risk; this Plugin catches it server-side. +- **Switch to monthly token budgets.** The 60-second window here is intentionally aggressive for the demo so a few prompts visibly exhaust it. Production teams usually enforce monthly budgets, for example {%raw%}`limits: [{limit: 5000000, window_size: 2592000}]`{%endraw%} for a 5 million token monthly ceiling per Consumer. Combine multiple `limits` entries to enforce burst and sustained budgets simultaneously. +- **Multi-node rate limiting with Redis.** The recipe uses `strategy: local`, which keeps counters in memory on each Kong node. For multi-node clusters, switch to `strategy: redis` and point to a shared Redis instance. +- **Move credentials into a vault.** Use [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) to source provider keys and Consumer credentials from HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, or the Konnect Config Store. Replace {% raw %}`${{ env "DECK_OPENAI_TOKEN" }}`{% endraw %} style references with `{vault://backend/key}` references. +- **Cover non-Copilot OpenAI-format clients.** This recipe works for any client that speaks the OpenAI Chat Completions API and authenticates with `Authorization: Bearer`. Cursor, Continue, and similar IDE assistants can point at the same Route with their own Consumer credential. + +## Cleanup + +The recipe's `select_tags` and kongctl namespace scoped all resources, so this teardown removes only this recipe's configuration. + +Delete the **Copilot Usage** custom dashboard. The dashboard is an org-level resource and outlives the Control Plane, so remove it before tearing down Kong: + +```bash +DASHBOARD_IDS=$(kongctl api get "/v2/dashboards?filter%5Blabels.recipe%5D=github-copilot-byok-recipe" \ + --pat "${KONNECT_TOKEN}" -o json --jq '.data[].id' -r) + +if [ -z "${DASHBOARD_IDS}" ]; then + echo "No Copilot Usage dashboard found. Skipping." +else + for id in ${DASHBOARD_IDS}; do + if kongctl api delete "/v2/dashboards/${id}" --pat "${KONNECT_TOKEN}"; then + echo "Deleted Copilot Usage dashboard ${id}." + else + echo "Failed to delete dashboard ${id}." + fi + done +fi +``` + +Tear down Kong by deleting the local data plane and the {{site.konnect_product_name}} Control Plane: + +```bash +export KONNECT_CONTROL_PLANE_NAME='github-copilot-byok-recipe' && curl -Ls https://get.konghq.com/quickstart | bash -s -- -d -k $KONNECT_TOKEN +``` + +Remove the Kong AI Gateway entry from VS Code. Open the Chat view → model picker → **Manage Language Models...** → select **Kong AI Gateway** → **Remove**. This deletes the entry from `chatLanguageModels.json` AND the stored secret from the OS keychain. Copilot resumes routing to GitHub's hosted models for the seat. diff --git a/app/_cookbooks/guardrail-integrations.md b/app/_cookbooks/guardrail-integrations.md index 952b7f5f0a..f280db5204 100644 --- a/app/_cookbooks/guardrail-integrations.md +++ b/app/_cookbooks/guardrail-integrations.md @@ -17,7 +17,6 @@ categories: - guardrails featured: false popular: false -published: false # Machine-readable fields for AI agent setup plugins: @@ -39,7 +38,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. A later step claims the Control Plane for declarative management with kongctl. @@ -62,7 +61,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 2. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). You can verify both are installed: @@ -116,7 +115,7 @@ prereqs: pip install 'openai>=1.0.0' ``` overview: | - This recipe demonstrates how to integrate external guardrail services with {{site.ai_gateway_name}} using the [AI Custom Guardrail](/plugins/ai-custom-guardrail/) Plugin, introduced in Kong 3.14. The custom guardrail Plugin provides a single adaptable interface that works with any HTTP-based guardrail provider. You define the request format, write a short Lua function to parse the response, and the Plugin handles the rest. + This recipe demonstrates how to integrate external guardrail services with Kong AI Gateway using the [AI Custom Guardrail](/plugins/ai-custom-guardrail/) Plugin, introduced in Kong 3.14. The custom guardrail Plugin provides a single adaptable interface that works with any HTTP-based guardrail provider. You define the request format, write a short Lua function to parse the response, and the Plugin handles the rest. The Azure Content Safety tab shows the custom guardrail Plugin alongside the dedicated [AI Azure Content Safety](/plugins/ai-azure-content-safety/) Plugin so you can see the difference. The Mistral Moderation tab shows the custom guardrail integrating with a service that has no dedicated Kong Plugin, demonstrating why a universal adapter is the default approach going forward. --- @@ -133,7 +132,7 @@ The AI content safety landscape is fragmented and growing fast. Every major clou ## The solution -{{site.ai_gateway_name}} treats guardrail integration as a configuration problem rather than a code problem. Two capabilities anchor this recipe: +Kong AI Gateway treats guardrail integration as a configuration problem rather than a code problem. Two capabilities anchor this recipe: - **A universal HTTP adapter for content moderation.** Instead of waiting for a vendor-specific Plugin to ship for every new guardrail service, you describe the service's API in declarative configuration. The same adapter pattern serves Azure Content Safety, Mistral Moderation, internal compliance endpoints, or a custom ML model. When organizations switch providers or evaluate vendors in parallel, the change happens at the gateway layer, not in every application. @@ -145,7 +144,7 @@ This recipe demonstrates the [AI Custom Guardrail](/plugins/ai-custom-guardrail/ {% mermaid %} sequenceDiagram participant C as Client - participant K as {{site.ai_gateway_name}} + participant K as Kong AI Gateway participant G as Guardrail Service participant L as LLM Provider @@ -170,26 +169,14 @@ sequenceDiagram {% endmermaid %} -{% table %} -columns: - - title: Component - key: component - - title: Responsibility - key: responsibility -rows: - - component: Client application - responsibility: Sends OpenAI-format chat requests with an API key for Consumer identification - - component: Kong, [key-auth](/plugins/key-auth/) - responsibility: Identifies the Consumer and rejects requests without a valid API key - - component: Kong, guardrail Plugin ([ai-azure-content-safety](/plugins/ai-azure-content-safety/) or [ai-custom-guardrail](/plugins/ai-custom-guardrail/)) - responsibility: Extracts text content and sends it to the external guardrail service for evaluation - - component: External guardrail service - responsibility: Analyzes content against safety policies and returns a verdict - - component: Kong, [ai-proxy-advanced](/plugins/ai-proxy-advanced/) - responsibility: Routes approved requests to the configured LLM provider - - component: LLM provider - responsibility: Processes the prompt and returns a completion -{% endtable %} +| Component | Responsibility | +|-----------|---------------| +| Client application | Sends OpenAI-format chat requests with an API key for Consumer identification | +| Kong, [key-auth](/plugins/key-auth/) | Identifies the Consumer and rejects requests without a valid API key | +| Kong, guardrail Plugin ([ai-azure-content-safety](/plugins/ai-azure-content-safety/) or [ai-custom-guardrail](/plugins/ai-custom-guardrail/)) | Extracts text content and sends it to the external guardrail service for evaluation | +| External guardrail service | Analyzes content against safety policies and returns a verdict | +| Kong, [ai-proxy-advanced](/plugins/ai-proxy-advanced/) | Routes approved requests to the configured LLM provider | +| LLM provider | Processes the prompt and returns a completion | {:.info} > **AWS SigV4 auth:** The AI Custom Guardrail Plugin supports HTTP-based authentication (API keys in headers, query parameters, or request body). AWS Bedrock Guardrails requires SigV4 request signing, which the custom Plugin does not support. Use the dedicated [AI AWS Guardrails](/plugins/ai-aws-guardrails/) Plugin for that service. @@ -225,7 +212,7 @@ The [key-auth](/plugins/key-auth/) Plugin matches the `apikey` request header ag **`hide_credentials: true`**: Strips the `apikey` header before Kong forwards the request upstream. This prevents the Consumer credential from leaking to the LLM provider in proxied requests. -The recipe registers a single Consumer (`demo-app`) with one API key. Production deployments typically use multiple Consumers, often with [Consumer Groups](/gateway/entities/consumer-group/) to apply different policies per tier. +The recipe registers a single Consumer (`demo-app`) with one API key. Production deployments typically use multiple Consumers, often with [Consumer Groups](/gateway/consumer-groups/) to apply different policies per tier. ### AI Azure Content Safety, dedicated integration @@ -443,28 +430,17 @@ The template syntax is `$(.)`, where the function nam ##### Template variable reference -{% table %} -columns: - - title: Variable - key: variable - - title: Description - key: description -rows: - - variable: "`$(content)`" - description: Text being inspected, extracted based on `text_source` - - variable: "`$(source)`" - description: "Current inspection phase: `INPUT` or `OUTPUT`" - - variable: "`$(conf.params.)`" - description: Access a value from `config.params` - - variable: "`$(resp)`" - description: Raw guardrail service response (used in functions) - - variable: "`$(.)`" - description: Access a field from a function's return table -{% endtable %} +| Variable | Description | +|----------|-------------| +| `$(content)` | Text being inspected, extracted based on `text_source` | +| `$(source)` | Current inspection phase: `INPUT` or `OUTPUT` | +| `$(conf.params.)` | Access a value from `config.params` | +| `$(resp)` | Raw guardrail service response (used in functions) | +| `$(.)` | Access a field from a function's return table | ### AI Proxy Advanced, LLM routing with secure-by-default fields -The [ai-proxy-advanced](/plugins/ai-proxy-advanced/) Plugin handles credential injection and LLM routing for approved requests. Two configuration fields are set explicitly to align with {{site.base_gateway}} 3.14's secure-by-default posture: +The [ai-proxy-advanced](/plugins/ai-proxy-advanced/) Plugin handles credential injection and LLM routing for approved requests. Two configuration fields are set explicitly to align with Kong Gateway 3.14's secure-by-default posture: {%- raw %} ```yaml @@ -504,11 +480,11 @@ The AI Custom Guardrail Plugin is the default approach for new guardrail integra ### Production considerations {:.info} -> In production, store credentials in [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. The `config.params` section of the AI Custom Guardrail Plugin supports vault references directly. +> In production, store credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. The `config.params` section of the AI Custom Guardrail Plugin supports vault references directly. ## Apply the Kong configuration -The configuration below creates a {{site.base_gateway}} Service, Route, the Plugins described in [How it works](#how-it-works), and a single demo Consumer. The `select_tags` and kongctl `namespace` scope all resources to this recipe, enabling clean teardown and co-existence with other configurations on the same Control Plane. +The configuration below creates a Kong Gateway Service, Route, the Plugins described in [How it works](#how-it-works), and a single demo Consumer. The `select_tags` and kongctl `namespace` scope all resources to this recipe, enabling clean teardown and co-existence with other configurations on the same Control Plane. This section runs in two parts. First, adopt the quickstart Control Plane into a kongctl namespace so the apply commands below can manage it: @@ -520,7 +496,7 @@ kongctl adopt control-plane "${KONNECT_CONTROL_PLANE_NAME}" \ Adoption stamps the `KONGCTL-namespace` label on the Control Plane. -`KONNECT_CONTROL_PLANE_NAME` and `KONNECT_TOKEN` are exported once during the {{site.konnect_product_name}} prereq, and `DECK_OPENAI_TOKEN`, `DECK_AZURE_CONTENT_SAFETY_*`, and `DECK_MISTRAL_MODERATION_TOKEN` come from the credential prereqs. Each tab below exports only the model selection that varies per configuration. +`KONNECT_CONTROL_PLANE_NAME` and `KONNECT_TOKEN` are exported once during the Kong Konnect prereq, and `DECK_OPENAI_TOKEN`, `DECK_AZURE_CONTENT_SAFETY_*`, and `DECK_MISTRAL_MODERATION_TOKEN` come from the credential prereqs. Each tab below exports only the model selection that varies per configuration. {% navtabs "Guardrail Service" %} {% tab Azure Content Safety %} @@ -871,7 +847,7 @@ The demo script works with any of the configurations above. It sends four reques {:.info} -> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. +> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. Create the demo script: @@ -1081,20 +1057,11 @@ Content-Type: application/json Kong adds the following response headers to allowed requests: -{% table %} -columns: - - title: Header - key: header - - title: Description - key: description -rows: - - header: "`X-Kong-LLM-Model`" - description: Model name selected for this request - - header: "`X-Kong-Upstream-Latency`" - description: Time (ms) Kong spent waiting for the LLM provider - - header: "`X-Kong-Proxy-Latency`" - description: Time (ms) Kong spent processing the request, including the guardrail call -{% endtable %} +| Header | Description | +|--------|-------------| +| `X-Kong-LLM-Model` | Model name selected for this request | +| `X-Kong-Upstream-Latency` | Time (ms) Kong spent waiting for the LLM provider | +| `X-Kong-Proxy-Latency` | Time (ms) Kong spent processing the request, including the guardrail call | ### Explore in Konnect @@ -1116,7 +1083,7 @@ You can review the resources and traffic this recipe produced directly in [Konne **Use AI AWS Guardrails for AWS Bedrock.** If your guardrail service is AWS Bedrock Guardrails, use the dedicated [AI AWS Guardrails](/plugins/ai-aws-guardrails/) Plugin. It handles SigV4 request signing natively, which the AI Custom Guardrail Plugin cannot do. Configure your guardrail ID and version in the Plugin config, and the Plugin manages the full AWS authentication flow. -**Use Kong Vaults for production credential management.** Replace the environment variable exports with vault references to store your Azure Content Safety API key, Mistral moderation token, and OpenAI API key securely. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. See the [secrets management documentation](https://docs.konghq.com/gateway/secrets-management/) for setup instructions. +**Use Kong Vaults for production credential management.** Replace the environment variable exports with vault references to store your Azure Content Safety API key, Mistral moderation token, and OpenAI API key securely. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. See the [secrets management documentation](https://docs.konghq.com/gateway/latest/kong-enterprise/secrets-management/) for setup instructions. ## Cleanup diff --git a/app/_cookbooks/llm-cost-optimization.md b/app/_cookbooks/llm-cost-optimization.md index 74d7e07a5f..1c697fbaf2 100644 --- a/app/_cookbooks/llm-cost-optimization.md +++ b/app/_cookbooks/llm-cost-optimization.md @@ -18,7 +18,6 @@ categories: featured: false popular: false - # Machine-readable fields for AI agent setup plugins: - key-auth @@ -36,7 +35,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. You will provision a recipe-scoped Control Plane and local Data Plane via the [quickstart script](https://get.konghq.com/quickstart). @@ -62,7 +61,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -244,7 +243,7 @@ to every request. Each technique addresses one of the problems above: {% mermaid %} sequenceDiagram participant C as Client - participant K as {{site.ai_gateway_name}} + participant K as Kong AI Gateway participant LC as LLMLingua Compressor participant L as LLM Provider @@ -346,7 +345,7 @@ $100/hour), so the API key the client sends determines the tier the request runs ``` {:.no-copy-code} -**`key_names: [apikey]`**. The headers (or query parameters) the Plugin looks in for the API key. The recipe uses `apikey` because the Key Auth Plugin performs an exact string match on the header value and does not inspect `Authorization` for Bearer tokens. The OpenAI SDK's `api_key` field always serializes as `Authorization: Bearer `, which Kong would read as the literal string `Bearer ` and fail to match against any stored credential. The "Try it out" section below points at a pre-function pattern that bridges the SDK's Bearer token to the `apikey` header server-side; the [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) guide has the full pattern. +**`key_names: [apikey]`**. The headers (or query parameters) the Plugin looks in for the API key. The recipe uses `apikey` because the Key Auth Plugin performs an exact string match on the header value and does not inspect `Authorization` for Bearer tokens. The OpenAI SDK's `api_key` field always serializes as `Authorization: Bearer `, which Kong would read as the literal string `Bearer ` and fail to match against any stored credential. The "Try it out" section below points at a pre-function pattern that bridges the SDK's Bearer token to the `apikey` header server-side; the [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) guide has the full pattern. **`hide_credentials: true`**. Strips the API key from the request before forwarding upstream. The provider never sees the Consumer's API key. This is a 3.14 default but the recipe sets it explicitly for clarity and to remain portable to older Gateway versions. @@ -569,7 +568,7 @@ consumer_groups: Kong sets response headers on every request so clients can track their remaining budget: `X-AI-RateLimit-Limit-hour-openai: 1` and `X-AI-RateLimit-Remaining-hour-openai: 0.987`. When the budget is exhausted, Kong returns `429 Too Many Requests` with a `Retry-After` header. The window label in the header (`hour`, `minute`, etc.) is derived from the configured `window_size`: `3600` becomes `hour`, `60` becomes `minute`, and non-standard sizes use the raw seconds value. Both targets in the recipe use `name: openai`, so a single bucket tracks total tier spend across `gpt-4o` and `gpt-4o-mini` together; to split budgets per model, add separate `llm_providers` entries with distinct `name` values or break each model onto its own Route. -For simplicity, this recipe stores Consumer API keys directly in Plugin config and provider credentials in environment variables. In production, reference both through [Kong Vaults](/gateway/secrets-management/) instead, backed by your preferred secret manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault). +For simplicity, this recipe stores Consumer API keys directly in Plugin config and provider credentials in environment variables. In production, reference both through [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) instead, backed by your preferred secret manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, or Azure Key Vault). ### Example response @@ -1066,7 +1065,7 @@ compression in action, parallel calls from both tiers to compare rate limit budg invalid-API-key call to confirm Kong rejects unauthorized requests before any upstream call. {:.info} -> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. +> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. Create the demo script: diff --git a/app/_cookbooks/model-based-routing.md b/app/_cookbooks/model-based-routing.md index dd0a96dbf7..961b701238 100644 --- a/app/_cookbooks/model-based-routing.md +++ b/app/_cookbooks/model-based-routing.md @@ -14,7 +14,8 @@ works_on: min_version: gateway: '3.14' categories: - - llm-routing + - llm + - cost-optimization featured: true popular: false @@ -34,7 +35,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. @@ -74,10 +75,15 @@ prereqs: 3. Export credentials: ```sh - export DECK_OPENAI_TOKEN='Bearer sk-YOUR-KEY' - export DECK_AWS_ACCESS_KEY_ID='YOUR-AWS-ACCESS-KEY' - export DECK_AWS_SECRET_ACCESS_KEY='YOUR-AWS-SECRET-KEY' - export DECK_AWS_REGION='us-east-1' + export DECK_OPENAI_SELECTOR_SLM='o3-mini' # Model selection (SLM) + export DECK_OPENAI_FAST_MODEL='gpt-4o-mini' # Fast tier + export DECK_BEDROCK_SMART_MODEL='global.anthropic.claude-sonnet-4-6' # Smart tier (Claude on Bedrock) + + # Also export your provider credentials + export DECK_OPENAI_TOKEN='Bearer sk-...' + export DECK_AWS_ACCESS_KEY_ID='your-access-key-id' + export DECK_AWS_SECRET_ACCESS_KEY='your-secret-access-key' + export DECK_AWS_REGION='us-east-1' ``` overview: | This recipe demonstrates intelligent cross-provider routing where {{site.ai_gateway_name}} analyzes each prompt and dynamically routes it to the optimal provider based on complexity. Simple prompts are routed to OpenAI for speed and cost efficiency, while complex tasks requiring deep reasoning are routed to AWS Bedrock (Claude). @@ -105,7 +111,7 @@ The solution uses two Routes working in tandem: 1. **Model selection Route:** Receives prompts, analyzes complexity via OpenAI o3-mini, and returns a tier recommendation ("fast" or "smart"). -2. **Default LLM Route:** Your application's main chat endpoint. The Datakit plugin intercepts requests, calls the model selection Route, extracts the tier recommendation, modifies the request body to specify the recommended tier, and forwards it to the AI Proxy Advanced plugin. The plugin has two targets — one for OpenAI (fast tier) and one for AWS Bedrock (smart tier) — and routes based on the tier field. +2. **Default LLM Route:** Your application's main chat endpoint. The DataKit plugin intercepts requests, calls the model selection Route, extracts the tier recommendation, modifies the request body to specify the recommended tier, and forwards it to the AI Proxy Advanced plugin. The plugin has two targets — one for OpenAI (fast tier) and one for AWS Bedrock (smart tier) — and routes based on the tier field. This architecture provides: @@ -119,17 +125,17 @@ This architecture provides: sequenceDiagram participant Client participant Kong as Kong AI Gateway - participant Selector as Model Selection Route
(OpenAI o3-mini) + participant Model Selector as Model Selection Route
(OpenAI o3-mini) participant OpenAI participant Bedrock as AWS Bedrock
(Claude) Client->>Kong: POST /chat (with prompt) Note over Kong: DataKit plugin intercepts - Kong->>Selector: Call /model-selection (with prompt) - Selector->>OpenAI: Analyze prompt complexity (o3-mini) - OpenAI-->>Selector: Return tier recommendation - Selector-->>Kong: Return tier ("fast" or "smart") + Kong->>Model Selector: Call /model-selection (with prompt) + Model Selector->>OpenAI: Analyze prompt complexity (o3-mini) + OpenAI-->>Model Selector: Return tier recommendation + Model Selector-->>Kong: Return tier ("fast" or "smart") Note over Kong: DataKit updates request body model field alt Fast Tier @@ -146,19 +152,21 @@ sequenceDiagram {% table %} columns: - - title: Component - key: component - - title: Responsibility - key: responsibility + - title: "Component" + - title: "Responsibility" rows: - - component: Client application - responsibility: "Sends standard chat completion requests to `/chat`. No routing logic required." - - component: DataKit plugin (default-llm) - responsibility: "Extracts prompt, calls `/model-selection`, modifies request body with tier recommendation." - - component: Model selection Route - responsibility: "Analyzes prompt complexity using OpenAI o3-mini, returns `fast` or `smart`." - - component: AI Proxy Advanced (default-llm) - responsibility: "Routes to OpenAI (fast) or AWS Bedrock (smart) based on the `model` field in the request body. Handles provider auth and format translation." + - columns: + - "Client application" + - "Sends standard chat completion requests to `/chat`. No routing logic required." + - columns: + - "DataKit plugin (default-llm)" + - "Extracts prompt, calls `/model-selection`, modifies request body with tier recommendation." + - columns: + - "Model selection Route" + - "Analyzes prompt complexity using OpenAI o3-mini, returns `\"fast\"` or `\"smart\"`." + - columns: + - "AI Proxy Advanced (default-llm)" + - "Routes to OpenAI (fast) or AWS Bedrock (smart) based on `model` field in request body. Handles provider auth and format translation." {% endtable %} ## How it works @@ -191,16 +199,12 @@ The Key Auth plugin enforces authentication on both Routes using a shared `apike - **`hide_credentials: true`:** Strips the `apikey` header before forwarding requests to the LLM provider, so API keys never leave {{site.base_gateway}}. - **`key_names`:** Defines which header carries the key. The demo uses `apikey` via the OpenAI SDK's `default_headers` parameter. -The recipe defines two Consumers: -- **`demo-consumer`:** Client-facing authentication. End users authenticate with `apikey: demo-consumer-key`. -- **`internal-router`:** Service-to-service authentication. The Datakit plugin uses `apikey: internal-router-key` for internal calls to `/model-selection`. +The same Key Auth configuration applies to both the `model-selection` and `default-llm` Routes. The recipe defines one Consumer (`demo-consumer`) with key `demo-consumer-key` that authenticates to both. -This two-consumer pattern is standard for internal service-to-service traffic: clients authenticate once at the gateway edge, but internal service calls use separate credentials. When the DataKit plugin calls the model-selection Route internally, it uses the `internal-router-key` (via the `DECK_INTERNAL_ROUTER_KEY` environment variable), so the internal call passes authentication without needing to extract or forward the client's credentials. - -For production deployments, use [{{site.base_gateway}} Vaults](/gateway/secrets-management/) to store API keys: +When the DataKit plugin on the default-llm Route calls the model-selection Route internally, it extracts the `apikey` header from the incoming client request and forwards it, so the internal call also passes authentication. {:.info} -> In production, store credentials in [{{site.base_gateway}} Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. {{site.base_gateway}} supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the {{site.konnect_product_name}} Config Store. +> In production, store credentials in [{{site.base_gateway}} Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. {{site.base_gateway}} supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the {{site.konnect_product_name}} Config Store. ### AI Prompt Decorator: Inject routing instructions @@ -268,15 +272,15 @@ The [AI Proxy Advanced](/plugins/ai-proxy-advanced/) plugin on the model-selecti - **`max_request_body_size: 5242880`:** Allows prompts up to 5 MB. Model selection prompts are typically small (the decorator message plus the user's prompt), so this limit is generous. - **`response_streaming: deny`:** The DataKit plugin needs the full response body to extract the tier decision, so streaming is disabled. - **`logging.log_statistics: true`:** Logs token counts and latency for cost tracking. Set `log_payloads: true` in development to see request/response bodies, but never in production (exposes user prompts and tier decisions in logs). -- **`model.name`:** References `{% raw %}${{ env "DECK_OPENAI_SELECTOR_SLM" }}{% endraw %}`, which defaults to `o3-mini`. +- **`model.name`:** References `${{ env "DECK_OPENAI_SELECTOR_SLM" }}`, which defaults to `o3-mini`. The model responds with a single word ("fast" or "smart"). -### Datakit: Route selection orchestration +### DataKit: Route selection orchestration -The [Datakit](/plugins/datakit/) plugin on the default-llm Route orchestrates the model selection flow. It extracts the prompt from the client request, calls the `/model-selection` Route, parses the tier recommendation from the response, and modifies the request body's `model` field before the request reaches the AI Proxy Advanced plugin. +The [DataKit](/plugins/datakit/) plugin on the default-llm Route orchestrates the model selection flow. It extracts the prompt from the client request, calls the `/model-selection` Route, parses the tier recommendation from the response, and modifies the request body's `model` field before the request reaches the AI Proxy Advanced plugin. -Datakit operates as a workflow engine with node-based processing. Each node performs one transformation, and nodes connect by referencing each other's outputs. +DataKit operates as a workflow engine with node-based processing. Each node performs one transformation, and nodes connect by referencing each other's outputs. #### Configuration details @@ -387,8 +391,8 @@ The [AI Proxy Advanced](/plugins/ai-proxy-advanced/) plugin on the default-llm R - **`max_request_body_size: 10485760`:** Allows request bodies up to 10 MB, which accommodates large conversation histories or RAG-injected context. - **`response_streaming: allow`:** Enables streaming responses for interactive chat applications. The client can receive tokens as they're generated. - **`model.model_alias`:** Maps the tier name to this target. When the DataKit plugin sets `.model = "fast"`, the plugin routes to OpenAI. When `.model = "smart"`, it routes to AWS Bedrock. -- **Fast target (OpenAI):** Uses `{% raw %}${{ env "DECK_OPENAI_FAST_MODEL" }}{% endraw %}` (defaults to `gpt-4o-mini`) with Bearer token authentication. -- **Smart target (AWS Bedrock):** Uses `{% raw %}${{ env "DECK_BEDROCK_SMART_MODEL" }}{% endraw %}` (defaults to `us.anthropic.claude-haiku-4-5-20251001-v1:0`) with AWS IAM credentials and region configuration. +- **Fast target (OpenAI):** Uses `${{ env "DECK_OPENAI_FAST_MODEL" }}` (defaults to `gpt-4o-mini`) with Bearer token authentication. +- **Smart target (AWS Bedrock):** Uses `${{ env "DECK_BEDROCK_SMART_MODEL" }}` (defaults to `global.anthropic.claude-sonnet-4-6`) with AWS IAM credentials and region configuration. The plugin adds an `X-Kong-LLM-Model` response header showing which model served the request. The demo script reads this header to confirm the provider routing decision. @@ -396,45 +400,31 @@ The plugin adds an `X-Kong-LLM-Model` response header showing which model served Export your environment variables: +Sync the configuration to your Control Plane using decK: + ```bash -export DECK_OPENAI_SELECTOR_SLM='o3-mini' # Model selection (SLM) -export DECK_OPENAI_FAST_MODEL='gpt-4o-mini' # Fast tier -export DECK_BEDROCK_SMART_MODEL='us.anthropic.claude-haiku-4-5-20251001-v1:0' # Smart tier (Claude on Bedrock) - -# Service-to-service authentication for internal Datakit calls -export DECK_INTERNAL_ROUTER_KEY='internal-router-key' # Must match the internal-router consumer's key - -# Also export your provider credentials -export DECK_OPENAI_TOKEN='Bearer sk-...' -export DECK_AWS_ACCESS_KEY_ID='your-access-key-id' -export DECK_AWS_SECRET_ACCESS_KEY='your-secret-access-key' -export DECK_AWS_REGION='us-east-1' +deck gateway sync recipes/model-based-routing/kong-config/deck/multi-provider.yaml \ + --konnect-token "${KONNECT_TOKEN}" \ + --konnect-control-plane-name "${KONNECT_CONTROL_PLANE_NAME}" ``` +{: data-test-step="block" .collapsible } -Create the `multi-provider.yaml` file: +The configuration file contents: {%- raw %} -```bash -cat <<'EOF' > multi-provider.yaml - +```yaml _format_version: '3.0' _info: select_tags: - model-based-routing-recipe -# Consumers for authentication +# Consumer for authentication consumers: - # Client-facing consumer - username: demo-consumer keyauth_credentials: - key: demo-consumer-key - # Service-to-service consumer for internal Datakit calls - - username: internal-router - keyauth_credentials: - - key: internal-router-key - # Model selection service - analyzes prompts using OpenAI o3-mini services: - name: model-selection @@ -529,14 +519,14 @@ services: jq: | ({"messages": .messages}) - # Use service-to-service API key for internal model-selection call + # Extract API key from request headers for internal call - name: EXTRACT_AUTH type: jq input: request.headers output: service_request.headers jq: | { - apikey: "internal-router-key" + apikey: (.apikey // .Apikey // .APIKey) } @@ -614,18 +604,9 @@ services: model: provider: openai name: ${{ env "DECK_OPENAI_SELECTOR_SLM" }} -EOF ``` {% endraw -%} - -Then sync it to your Control Plane using decK: - -```bash -deck gateway sync multi-provider.yaml \ - --konnect-token "${KONNECT_TOKEN}" \ - --konnect-control-plane-name "${KONNECT_CONTROL_PLANE_NAME}" -``` -{: data-test-step="block" } +{:.no-copy-code} ## Try it out @@ -689,7 +670,7 @@ curl -X POST http://localhost:8000/chat \ -i ``` -Check the `X-Kong-LLM-Model` response header - it should show the model you configured for the smart tier (for example, `us.anthropic.claude-haiku-4-5-20251001-v1:0`), confirming routing to the AWS Bedrock smart tier. +Check the `X-Kong-LLM-Model` response header - it should show `global.anthropic.claude-sonnet-4-5-20250929-v1:0`, confirming routing to the AWS Bedrock smart tier. Example response for complex prompt (truncated): @@ -719,9 +700,9 @@ Content-Type: application/json ### What happened -1. **Simple prompt routing:** The simple prompt ("Hi there! What's 2 + 2?") routes to OpenAI's fast tier. The Datakit plugin calls the model-selection Route, OpenAI o3-mini analyzes the prompt complexity, returns "fast", Datakit updates the request body, and AI Proxy Advanced routes to the OpenAI target. The `X-Kong-LLM-Model` header shows `gpt-4o-mini`. +1. **Simple prompt routing:** The simple prompt ("Hi there! What's 2 + 2?") routes to OpenAI's fast tier. The DataKit plugin calls the model-selection Route, OpenAI o3-mini analyzes the prompt complexity, returns "fast", DataKit updates the request body, and AI Proxy Advanced routes to the OpenAI target. The `X-Kong-LLM-Model` header shows `gpt-4o-mini`. -2. **Complex prompt routing:** The complex prompt (binary search implementation) routes to AWS Bedrock's smart tier. The model-selection LLM recognizes this as a reasoning-heavy task and returns "smart", which Datakit forwards to the AWS Bedrock target (Claude). The `X-Kong-LLM-Model` header shows `us.anthropic.claude-haiku-4-5-20251001-v1:0`. +2. **Complex prompt routing:** The complex prompt (binary search implementation) routes to AWS Bedrock's smart tier. The model-selection LLM recognizes this as a reasoning-heavy task and returns "smart", which DataKit forwards to the AWS Bedrock target (Claude). The `X-Kong-LLM-Model` header shows `global.anthropic.claude-sonnet-4-5-20250929-v1:0`. 3. **`X-Kong-LLM-Model` header:** Every response includes this header showing which model served the request. In production, this header enables per-request observability — your application can log it for cost attribution or debugging. diff --git a/app/_cookbooks/multi-layer-ai-guardrails.md b/app/_cookbooks/multi-layer-ai-guardrails.md index 4409fbc79a..df2dead623 100644 --- a/app/_cookbooks/multi-layer-ai-guardrails.md +++ b/app/_cookbooks/multi-layer-ai-guardrails.md @@ -17,7 +17,6 @@ categories: - guardrails featured: false popular: false -published: false # Machine-readable fields for AI agent setup plugins: @@ -44,7 +43,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. A later step claims the Control Plane for declarative management with kongctl. @@ -67,7 +66,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -178,7 +177,7 @@ prereqs: ``` overview: | - This recipe configures {{site.ai_gateway_name}} with three independent guardrail layers behind a key-auth boundary on a single Route: regex-based keyword filtering, embedding-based semantic analysis, and PII sanitization. By the end, you will have a gateway endpoint that authenticates the caller, strips sensitive data from every request, blocks prompts that are semantically similar to known harmful patterns, catches obvious keyword violations, and only then forwards the cleaned, validated request to your LLM provider. + This recipe configures Kong AI Gateway with three independent guardrail layers behind a key-auth boundary on a single Route: regex-based keyword filtering, embedding-based semantic analysis, and PII sanitization. By the end, you will have a gateway endpoint that authenticates the caller, strips sensitive data from every request, blocks prompts that are semantically similar to known harmful patterns, catches obvious keyword violations, and only then forwards the cleaned, validated request to your LLM provider. Each layer addresses a different class of risk. The [Key Auth](/plugins/key-auth/) Plugin identifies the calling Consumer, the [AI PII Sanitizer](/plugins/ai-sanitizer/) Plugin removes PII, the [AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/) Plugin checks the sanitized content against vector embeddings, the [AI Prompt Guard](/plugins/ai-prompt-guard/) Plugin applies regex pattern matching, and the [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin routes the request to the LLM. --- @@ -199,29 +198,26 @@ Keyword matching is fast but shallow. Semantic analysis is deep but more expensi ## The solution -{{site.ai_gateway_name}} solves this by stacking four Plugins on a single Route, each responsible for one class of threat. Kong's Plugin priority system executes them in a fixed order on every request, giving you defense-in-depth with a single endpoint. +Kong AI Gateway solves this by stacking four Plugins on a single Route, each responsible for one class of threat. Kong's Plugin priority system executes them in a fixed order on every request, giving you defense-in-depth with a single endpoint. {% table %} columns: - title: Plugin - key: plugin - title: What it catches - key: catches - title: How it works - key: works rows: - - plugin: "[Key Auth](/plugins/key-auth/)" - catches: Anonymous traffic - works: "Matches the `apikey` header against registered Consumer credentials" - - plugin: "[AI PII Sanitizer](/plugins/ai-sanitizer/)" - catches: Sensitive data (names, emails, SSNs, credit cards, credentials) - works: Sends content to an external PII detection service - - plugin: "[AI Prompt Guard](/plugins/ai-prompt-guard/)" - catches: Literal keyword matches (hack, exploit, malware, weapon) - works: Regex pattern matching, no external calls - - plugin: "[AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/)" - catches: Rephrased or paraphrased harmful prompts - works: Compares embeddings against known bad patterns in Redis + - - "[Key Auth](/plugins/key-auth/)" + - Anonymous traffic + - "Matches the `apikey` header against registered Consumer credentials" + - - "[AI PII Sanitizer](/plugins/ai-sanitizer/)" + - Sensitive data (names, emails, SSNs, credit cards, credentials) + - Sends content to an external PII detection service + - - "[AI Prompt Guard](/plugins/ai-prompt-guard/)" + - Literal keyword matches (hack, exploit, malware, weapon) + - Regex pattern matching, no external calls + - - "[AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/)" + - Rephrased or paraphrased harmful prompts + - Compares embeddings against known bad patterns in Redis {% endtable %} Authentication runs first so every downstream check is associated with a known Consumer. The PII sanitizer runs next, stripping sensitive data before any other Plugin or upstream provider sees it. The regex guard then runs as a fast keyword filter on the sanitized content, catching obvious literal violations with no external calls. The semantic guard runs as the deeper check, catching paraphrased attacks the regex layer cannot match. Each AI guard Plugin has a default priority that places it before `ai-proxy-advanced`, so the chain runs in the correct order without explicit ordering directives. @@ -230,7 +226,7 @@ Authentication runs first so every downstream check is associated with a known C {% mermaid %} sequenceDiagram participant C as Client - participant K as {{site.ai_gateway_name}} + participant K as Kong AI Gateway participant P as PII Detection Service participant L as LLM Provider @@ -392,7 +388,7 @@ The AI Prompt Guard Plugin provides a fast, zero-cost first check against obviou **`deny_patterns`**, a list of regular expressions checked against every message in the request. If any message matches any pattern, Kong returns `400 Bad Request`. Patterns use standard regex syntax. The examples above use case-insensitive alternation to match both capitalized and lowercase forms. -You can add `allow_patterns` alongside deny patterns. Deny takes precedence: any prompt that matches a deny pattern is rejected with `400 Bad Request`, even if it also matches an allow pattern. Allow patterns are useful when you want to allowlist a specific subset of an otherwise restricted topic. For per-role filtering and the full configuration reference, see the [AI Prompt Guard](/plugins/ai-prompt-guard/) reference. +You can add `allow_patterns` alongside deny patterns. Deny takes precedence: any prompt that matches a deny pattern is rejected with `400 Bad Request`, even if it also matches an allow pattern. Allow patterns are useful when you want to whitelist a specific subset of an otherwise restricted topic. For per-role filtering and the full configuration reference, see the [AI Prompt Guard](/plugins/ai-prompt-guard/) reference. ### AI Proxy Advanced: LLM routing @@ -421,7 +417,7 @@ The AI Proxy Advanced Plugin handles authentication with the LLM provider and ro {% endraw -%} {:.no-copy-code} -**`max_request_body_size: 10485760`**, sets a 10 MB cap on incoming request bodies. {{site.base_gateway}} 3.14 requires this field on `ai-proxy-advanced` rather than relying on an implicit default. Tune for your expected payload size: large RAG injections or long conversation histories may need a higher value, and stricter limits make sense for narrow chatbot routes. +**`max_request_body_size: 10485760`**, sets a 10 MB cap on incoming request bodies. Kong Gateway 3.14 requires this field on `ai-proxy-advanced` rather than relying on an implicit default. Tune for your expected payload size: large RAG injections or long conversation histories may need a higher value, and stricter limits make sense for narrow chatbot routes. **`response_streaming: deny`**, disables response streaming for this Route. The guardrail chain inspects full responses before returning them to the client (for example, the AI PII Sanitizer's `recover_redacted` mode replaces placeholders with originals on the way back). Streaming would defeat post-response inspection, so this Route opts out. For interactive chat without post-response processing, set `allow` instead. @@ -440,26 +436,24 @@ The Plugin annotates every response with headers that confirm which model served {% table %} columns: - title: Header - key: header - title: Description - key: description rows: - - header: "`X-Kong-LLM-Model`" - description: "Model name selected by `ai-proxy-advanced`" - - header: "`X-Kong-Upstream-Latency`" - description: Time (ms) Kong spent waiting for the provider to respond - - header: "`X-Kong-Proxy-Latency`" - description: Time (ms) Kong spent on auth, PII sanitization, and guardrails + - - "`X-Kong-LLM-Model`" + - "Model name selected by `ai-proxy-advanced`" + - - "`X-Kong-Upstream-Latency`" + - Time (ms) Kong spent waiting for the provider to respond + - - "`X-Kong-Proxy-Latency`" + - Time (ms) Kong spent on auth, PII sanitization, and guardrails {% endtable %} ### Production considerations {:.info} -> In production, store credentials in [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. +> In production, store credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. ## Apply the Kong configuration -The configuration below creates a {{site.base_gateway}} Service, Route, four guardrail Plugins described in [How it works](#how-it-works), and a `demo-app` Consumer with the `apikey` credential `demo-api-key`. The `select_tags` and kongctl `namespace` scope all resources to this recipe, enabling clean teardown and co-existence with other configurations on the same Control Plane. +The configuration below creates a Kong Gateway Service, Route, four guardrail Plugins described in [How it works](#how-it-works), and a `demo-app` Consumer with the `apikey` credential `demo-api-key`. The `select_tags` and kongctl `namespace` scope all resources to this recipe, enabling clean teardown and co-existence with other configurations on the same Control Plane. First, adopt the quickstart Control Plane into a kongctl namespace so the apply commands below can manage it: @@ -880,7 +874,7 @@ The demo script sends five requests that exercise each layer of the chain: an un {:.info} -> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. +> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. Create the demo script: @@ -1130,7 +1124,7 @@ After running the demo, switch to the Konnect UI at [cloud.konghq.com](https://c ## Variations and next steps -**Adjust regex and semantic thresholds.** The semantic guard's `0.75` default is tuned for OpenAI's `text-embedding-3-large`; lower it to require closer matches, raise it to catch broader variations, and retune whenever you change embedding models (different models use different distance scales). For regex patterns, add domain-specific terms and use `allow_patterns` to allowlist legitimate terms that contain blocked substrings (for example, "hackathon"). +**Adjust regex and semantic thresholds.** The semantic guard's `0.75` default is tuned for OpenAI's `text-embedding-3-large`; lower it to require closer matches, raise it to catch broader variations, and retune whenever you change embedding models (different models use different distance scales). For regex patterns, add domain-specific terms and use `allow_patterns` to whitelist legitimate terms that contain blocked substrings (for example, "hackathon"). **Add response-phase guardrails.** This recipe only inspects the request. Set `sanitization_mode: BOTH` on the AI PII Sanitizer Plugin to also scan LLM responses for PII before returning them to the client. Combine this with the [AI Semantic Response Guard](/plugins/ai-semantic-response-guard/) Plugin to check LLM output against a separate set of deny rules, catching cases where the model generates harmful content despite a safe prompt. @@ -1140,7 +1134,7 @@ After running the demo, switch to the Konnect UI at [cloud.konghq.com](https://c **Integrate with external guardrail services.** For organization-specific content policies, add cloud guardrail services alongside the layers in this recipe. The [AI Custom Guardrail](/plugins/ai-custom-guardrail/) Plugin connects to any HTTP-based guardrail service (Mistral Moderation, Azure Content Safety, custom internal endpoints) through a universal templating system. See the [Guardrail Integrations](/cookbooks/guardrail-integrations/) recipe for complete examples comparing dedicated and universal approaches. -**Use Kong Vaults for production credential management.** Replace the environment variable exports with vault references to store your LLM API keys, Redis credentials, and PII service host securely. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. See the [secrets management documentation](/gateway/secrets-management/) for setup instructions. +**Use Kong Vaults for production credential management.** Replace the environment variable exports with vault references to store your LLM API keys, Redis credentials, and PII service host securely. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. See the [secrets management documentation](/gateway/latest/kong-enterprise/secrets-management/) for setup instructions. ## Cleanup diff --git a/app/_cookbooks/secure-external-mcp-gateway.md b/app/_cookbooks/secure-external-mcp-gateway.md index 89dd218c7f..806a55bee1 100644 --- a/app/_cookbooks/secure-external-mcp-gateway.md +++ b/app/_cookbooks/secure-external-mcp-gateway.md @@ -16,7 +16,7 @@ min_version: categories: - mcp - access-control -featured: true +featured: false popular: false # Machine-readable fields for AI agent setup @@ -43,7 +43,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. @@ -61,7 +61,7 @@ prereqs: ``` {:.warning} - > This reuses your PAT as the upstream credential so the demo only needs one Konnect token. In production, generate a **System Account Token** with least-privilege permissions in **Organization > System Accounts** and store it in a [Kong Vault](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references. PATs inherit the creator's full role and are tied to an individual user, which is unsuitable for a shared, audited service-account credential. + > This reuses your PAT as the upstream credential so the demo only needs one Konnect token. In production, generate a **System Account Token** with least-privilege permissions in **Organization > System Accounts** and store it in a [Kong Vault](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references. PATs inherit the creator's full role and are tied to an individual user, which is unsuitable for a shared, audited service-account credential. 1. Set the recipe-scoped Control Plane name and run the quickstart script: @@ -75,7 +75,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -205,7 +205,7 @@ prereqs: {% endnavtabs %} - title: Konnect MCP region content: | - The [Konnect MCP server](/konnect-platform/konnect-mcp/#regional-server-endpoints) has region-scoped endpoints; resources don't cross regions. Set this to the host matching the Konnect region your organization runs in: + The [Konnect MCP server](https://developer.konghq.com/konnect-platform/konnect-mcp/#regional-server-endpoints) has region-scoped endpoints; resources don't cross regions. Set this to the host matching the Konnect region your organization runs in: ```bash # Pick one: us.mcp.konghq.com, eu.mcp.konghq.com, or au.mcp.konghq.com @@ -264,7 +264,7 @@ This recipe places {{site.base_gateway}} in front of external MCP servers with t {% mermaid %} sequenceDiagram participant C as MCP Client - participant K as {{site.base_gateway}} + participant K as Kong Gateway participant GH as GitHub C->>K: MCP initialize (no token) @@ -301,7 +301,7 @@ sequenceDiagram {% mermaid %} sequenceDiagram participant C as MCP Client - participant K as {{site.base_gateway}} + participant K as Kong Gateway participant IdP as Identity Provider participant KM as Konnect MCP Server @@ -632,7 +632,7 @@ least-privilege Service Account Token stored in a Kong Vault backend. The recipe authenticates users on this route but doesn't enforce tool-level ACL. The [AI MCP OAuth2](/plugins/ai-mcp-oauth2/) Plugin already maps an IdP `groups` claim to Kong Consumer Groups (`consumer_groups_claim: [groups]`), so layering on per-tool ACL is a matter of pre-creating the Consumer Groups and adding ACL rules to the [AI MCP Proxy](/plugins/ai-mcp-proxy/) Plugin, the same way the GitHub MCP route does. The [Secure Internal MCP Gateway](/cookbooks/secure-internal-mcp-gateway/) recipe shows the full IdP-claim-to-ACL pattern end-to-end. {:.info} -> In production, store credentials in [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. +> In production, store credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. ## Apply the Kong configuration diff --git a/app/_cookbooks/secure-internal-mcp-gateway.md b/app/_cookbooks/secure-internal-mcp-gateway.md index 3b64f79694..649ef630fa 100644 --- a/app/_cookbooks/secure-internal-mcp-gateway.md +++ b/app/_cookbooks/secure-internal-mcp-gateway.md @@ -37,7 +37,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. @@ -60,7 +60,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -83,7 +83,7 @@ prereqs: You need an Okta organization with admin access. The steps below create two Okta applications, configure a `groups` claim, create two groups, set up a test user (new or existing), and export Kong's introspection credentials. - **Create the {{site.base_gateway}} application** + **Create the Kong Gateway application** This is a confidential client that represents Kong as the resource server. Kong uses its credentials to call Okta's token introspection endpoint. @@ -132,7 +132,7 @@ prereqs: **Export Kong's Okta endpoints and credentials** - Export the authorization server URL, the introspection endpoint, and the **{{site.base_gateway}}** application's Client ID and Secret. The MCP Client (SPA) Client ID is not exported here. It is used at flow time by the MCP client itself. + Export the authorization server URL, the introspection endpoint, and the **Kong Gateway** application's Client ID and Secret. The MCP Client (SPA) Client ID is not exported here. It is used at flow time by the MCP client itself. ```bash export DECK_OAUTH_AUTH_SERVER='https://your-org.okta.com/oauth2/default' @@ -150,7 +150,7 @@ prereqs: In the Keycloak Admin Console, create a new realm (for example, `mcp-demo`), or use an existing one. - **Create the {{site.base_gateway}} client** + **Create the Kong Gateway client** This is a confidential client that represents Kong as the resource server. Kong uses its credentials to call Keycloak's token introspection endpoint. @@ -198,7 +198,7 @@ prereqs: **Export Kong's Keycloak endpoints and credentials** - Export the realm URL, the introspection endpoint, and the **{{site.base_gateway}}** client's ID and Secret. The MCP Client's Client ID is not exported here. It is used at flow time by the MCP client itself. + Export the realm URL, the introspection endpoint, and the **Kong Gateway** client's ID and Secret. The MCP Client's Client ID is not exported here. It is used at flow time by the MCP client itself. ```bash export DECK_OAUTH_AUTH_SERVER='https://your-keycloak-host/realms/mcp-demo' @@ -275,7 +275,7 @@ ACLs without a separate authentication Plugin. {% mermaid %} sequenceDiagram participant C as MCP Client - participant K as {{site.base_gateway}} + participant K as Kong Gateway participant IdP as Identity Provider participant B as Backend APIs @@ -517,7 +517,7 @@ The Route that hosts this Plugin includes two paths: `/ecommerce-mcp` for the MC **Choosing a token validation method.** {{site.base_gateway}} 3.14 added JWKS-based JWT validation to the AI MCP OAuth2 Plugin alongside the existing RFC 7662 introspection support. With `introspection_endpoint`, Kong calls the IdP on every request (with caching) using `client_id` + `client_secret`. This is the right default for IdPs that expose an introspection endpoint (Okta, Keycloak, Ping Identity, FusionAuth, ORY Hydra). With `jwks_endpoint`, Kong validates signed JWTs locally against the IdP's public keys with no per-request call to the IdP. Use this when your IdP does not implement RFC 7662 introspection: Microsoft Entra ID, Auth0, AWS Cognito, and Google OAuth2 all expose JWKS endpoints and issue signed JWTs that work directly with this mode. When both are configured, introspection wins. Pick whichever matches your IdP's capabilities and your latency budget. Introspection gives instant revocation at the cost of a network hop; JWKS is faster but tokens stay valid until they expire. See the Plugin's [token validation methods](/plugins/ai-mcp-oauth2/#token-validation-methods) reference for the full schema. {:.info} -> In production, store credentials in [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. +> In production, store credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. ## Apply the Kong configuration diff --git a/app/_cookbooks/voice-ai-observability.md b/app/_cookbooks/voice-ai-observability.md index e19192b8b3..bd61ad3d58 100644 --- a/app/_cookbooks/voice-ai-observability.md +++ b/app/_cookbooks/voice-ai-observability.md @@ -38,7 +38,7 @@ prereqs: skip_product: true skip_tool: true inline: - - title: "{{site.konnect_product_name}}" + - title: Kong Konnect content: | This tutorial uses {{site.konnect_product_name}}. The [quickstart script](https://get.konghq.com/quickstart) provisions a recipe-scoped Control Plane and local Data Plane. @@ -65,7 +65,7 @@ prereqs: content: | This tutorial uses [kongctl](/kongctl/) and [decK](/deck/) to manage Kong configuration. - 1. Install **kongctl** from [developer.konghq.com/kongctl](/kongctl/). + 1. Install **kongctl** from [developer.konghq.com/kongctl](https://developer.konghq.com/kongctl/). 1. Install **decK** version 1.43 or later from [docs.konghq.com/deck](https://docs.konghq.com/deck/). 1. Verify both are installed: @@ -186,7 +186,7 @@ prereqs: The demo uses the OpenTelemetry SDK plus the httpx auto-instrumentation to emit a `voice-turn` parent span per turn and to inject W3C `traceparent` into every outbound call so Kong's per-hop spans nest correctly under it in Langfuse. overview: | - A production voice AI system is a pipeline: speech-to-text (STT), LLM reasoning, and text-to-speech (TTS) execute in sequence for every conversational turn. Each hop carries its own latency budget, error modes, and cost profile. This recipe sets up {{site.ai_gateway_name}} to govern all three hops through separate Routes, each with its own [AI Proxy Advanced](/plugins/ai-proxy-advanced/) instance. The [Key Auth](/plugins/key-auth/) Plugin identifies the calling voice agent on every hop, and a global [OpenTelemetry](/plugins/opentelemetry/) Plugin exports `gen_ai.*` spans to [Langfuse](https://langfuse.com) for per-hop latency, token usage, and cost visibility, with conversation-level trace grouping for full-turn analysis. + A production voice AI system is a pipeline: speech-to-text (STT), LLM reasoning, and text-to-speech (TTS) execute in sequence for every conversational turn. Each hop carries its own latency budget, error modes, and cost profile. This recipe sets up Kong AI Gateway to govern all three hops through separate Routes, each with its own [AI Proxy Advanced](/plugins/ai-proxy-advanced/) instance. The [Key Auth](/plugins/key-auth/) Plugin identifies the calling voice agent on every hop, and a global [OpenTelemetry](/plugins/opentelemetry/) Plugin exports `gen_ai.*` spans to [Langfuse](https://langfuse.com) for per-hop latency, token usage, and cost visibility, with conversation-level trace grouping for full-turn analysis. By the end, you will have three Kong endpoints (`/stt`, `/llm`, `/tts`) proxying a complete voice pipeline behind a single API key, with every hop producing OpenTelemetry traces that appear as a single conversation trace in Langfuse. --- @@ -209,30 +209,17 @@ The alternative to the cascading pipeline is realtime speech-to-speech APIs (Ope ## The solution -This recipe places {{site.ai_gateway_name}} between the voice agent and all three providers. Each pipeline hop gets its own Kong Service, Route, and [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin instance. The [Key Auth](/plugins/key-auth/) Plugin identifies the calling voice agent on every hop, and a global [OpenTelemetry](/plugins/opentelemetry/) Plugin exports `gen_ai.*` spans from every hop to Langfuse, where they appear as a single conversation trace. - -{% table %} -columns: - - title: Component - key: component - - title: Role - key: role -rows: - - component: "`voice-ai-stt` Service" - role: Routes audio to OpenAI Whisper for transcription (`audio/v1/audio/transcriptions`) - - component: "`voice-ai-llm` Service" - role: Routes text to any supported LLM provider (`llm/v1/chat`), provider varies per tab - - component: "`voice-ai-tts` Service" - role: Routes text to OpenAI TTS for speech synthesis (`audio/v1/audio/speech`) - - component: AI Proxy Advanced (3 instances) - role: Injects credentials, handles format translation, emits per-hop telemetry - - component: Key Auth Plugin (global) - role: Authenticates the voice agent with a shared `apikey` header on every Route - - component: OpenTelemetry Plugin (global) - role: Exports `gen_ai.*` spans with provider, model, token usage, and latency to Langfuse - - component: Langfuse - role: Groups spans by W3C trace ID into conversation-level traces for full-turn visibility -{% endtable %} +This recipe places Kong AI Gateway between the voice agent and all three providers. Each pipeline hop gets its own Kong Service, Route, and [AI Proxy Advanced](/plugins/ai-proxy-advanced/) Plugin instance. The [Key Auth](/plugins/key-auth/) Plugin identifies the calling voice agent on every hop, and a global [OpenTelemetry](/plugins/opentelemetry/) Plugin exports `gen_ai.*` spans from every hop to Langfuse, where they appear as a single conversation trace. + +| Component | Role | +|-----------|------| +| Service `voice-ai-stt` | Routes audio to OpenAI Whisper for transcription (`audio/v1/audio/transcriptions`) | +| Service `voice-ai-llm` | Routes text to any supported LLM provider (`llm/v1/chat`), provider varies per tab | +| Service `voice-ai-tts` | Routes text to OpenAI TTS for speech synthesis (`audio/v1/audio/speech`) | +| AI Proxy Advanced (3 instances) | Injects credentials, handles format translation, emits per-hop telemetry | +| Key Auth Plugin (global) | Authenticates the voice agent with a shared `apikey` header on every Route | +| OpenTelemetry Plugin (global) | Exports `gen_ai.*` spans with provider, model, token usage, and latency to Langfuse | +| Langfuse | Groups spans by W3C trace ID into conversation-level traces for full-turn visibility | All three calls share a single W3C trace ID, which Langfuse uses to group the per-hop spans into one conversation-level trace. @@ -240,7 +227,7 @@ All three calls share a single W3C trace ID, which Langfuse uses to group the pe {% mermaid %} sequenceDiagram participant V as Voice Agent - participant K as {{site.ai_gateway_name}} + participant K as Kong AI Gateway participant P as Provider (Whisper / LLM / TTS) participant Lf as Langfuse @@ -311,7 +298,7 @@ When the demo processes a conversational turn, it makes three sequential request ### Key Auth: Voice agent identification -The [Key Auth](/plugins/key-auth/) Plugin authenticates the calling voice agent before any per-hop logic runs. It is configured at the global level so all three Routes (`/stt`, `/llm`, `/tts`) require the same `apikey` header. The recipe defines a single `voice-agent` Consumer with a static credential. In production, replace this with one Consumer per tenant or per voice client, rotated through [Kong Vaults](/gateway/secrets-management/). +The [Key Auth](/plugins/key-auth/) Plugin authenticates the calling voice agent before any per-hop logic runs. It is configured at the global level so all three Routes (`/stt`, `/llm`, `/tts`) require the same `apikey` header. The recipe defines a single `voice-agent` Consumer with a static credential. In production, replace this with one Consumer per tenant or per voice client, rotated through [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/). #### Configuration details @@ -332,7 +319,7 @@ consumers: - **`key_names: [apikey]`**. The header (or query parameter) the Plugin reads to identify the Consumer. Clients send `apikey: voice-demo-key`. See the [Key Auth reference](/plugins/key-auth/) for the full list of recognized parameter sources. - **`hide_credentials: true`**. Strips the credential from the request before it reaches the upstream provider. Without this, the `apikey` header would be forwarded to OpenAI, Anthropic, etc., leaking the gateway-side credential into provider logs. -- **`consumers[].keyauth_credentials[].key`**. The credential the Consumer presents. For non-trivial deployments, generate per-Consumer keys with `kongctl create consumer-credential` or rotate via [Kong Vaults](/gateway/secrets-management/). +- **`consumers[].keyauth_credentials[].key`**. The credential the Consumer presents. For non-trivial deployments, generate per-Consumer keys with `kongctl create consumer-credential` or rotate via [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/). For richer identity flows (JWT-based SSO, scoped audiences, IdP integration), swap Key Auth for the [OpenID Connect](/plugins/openid-connect/) Plugin. The [Claude Code SSO recipe](/cookbooks/claude-code-sso/) shows the pattern. @@ -462,35 +449,21 @@ plugins: Kong emits `gen_ai.*` span attributes on every AI Proxy Advanced request (v3.13+). These attributes follow the [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/) and include: -{% table %} -columns: - - title: Attribute - key: attribute - - title: Description - key: description -rows: - - attribute: "`gen_ai.provider.name`" - description: "Provider identifier (for example, `openai`, `anthropic`)" - - attribute: "`gen_ai.request.model`" - description: Model name from the request - - attribute: "`gen_ai.response.model`" - description: Model name from the provider response - - attribute: "`gen_ai.operation.name`" - description: "Operation type (`chat`, `embeddings`)" - - attribute: "`gen_ai.usage.input_tokens`" - description: Input token count - - attribute: "`gen_ai.usage.output_tokens`" - description: Output token count - - attribute: "`gen_ai.input.messages`" - description: Full input messages (when payload logging enabled) - - attribute: "`gen_ai.output.messages`" - description: Full output messages (when payload logging enabled) -{% endtable %} +| Attribute | Description | +|-----------|-------------| +| `gen_ai.provider.name` | Provider identifier (for example, `openai`, `anthropic`) | +| `gen_ai.request.model` | Model name from the request | +| `gen_ai.response.model` | Model name from the provider response | +| `gen_ai.operation.name` | Operation type (`chat`, `embeddings`) | +| `gen_ai.usage.input_tokens` | Input token count | +| `gen_ai.usage.output_tokens` | Output token count | +| `gen_ai.input.messages` | Full input messages (when payload logging enabled) | +| `gen_ai.output.messages` | Full output messages (when payload logging enabled) | ### Production considerations {:.info} -> In production, store credentials in [Kong Vaults](/gateway/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. +> In production, store credentials in [Kong Vaults](/gateway/latest/kong-enterprise/secrets-management/) using {%raw%}`{vault://backend/key}`{%endraw%} references rather than environment variables. Kong supports HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, and the Konnect Config Store. The `gen_ai.input.messages` and `gen_ai.output.messages` span attributes capture full prompt and response payloads. Review your data retention and access control policies before enabling payload logging in production, as these attributes may contain PII, sensitive business context, or credentials passed in prompts. @@ -1429,7 +1402,7 @@ The demo script runs a short three-turn voice conversation through the recipe. I {:.info} -> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. +> The demo passes the API key via `default_headers` because the OpenAI SDK reserves `api_key` for the `Authorization: Bearer` header. To let clients pass the key through `api_key` directly, attach a [pre-function](/plugins/pre-function/) Plugin that copies the Bearer token to the `apikey` header server-side. See [Authenticate OpenAI SDK clients with Key Auth](https://developer.konghq.com/how-to/authenticate-openai-sdk-clients-with-key-auth/) for the pattern. Look for per-hop timing in the output and the trace ID printed at the end of each turn. The `[LLM]` line shows the upstream model and token counts read from the parsed `completion.usage` field, which Kong's OpenAI-format response normalizes for every provider. After the script completes, open Langfuse, navigate to **Sessions**, and find the printed Session ID to see the three turns grouped under one conversation. @@ -1463,20 +1436,11 @@ X-Kong-Proxy-Latency: 18 Kong adds these response headers on every hop: -{% table %} -columns: - - title: Header - key: header - - title: Description - key: description -rows: - - header: "`X-Kong-LLM-Model`" - description: Upstream model that served the request (LLM hop only) - - header: "`X-Kong-Upstream-Latency`" - description: Time (ms) Kong spent waiting for the provider - - header: "`X-Kong-Proxy-Latency`" - description: Time (ms) Kong spent processing the request -{% endtable %} +| Header | Description | +| ------ | ----------- | +| `X-Kong-LLM-Model` | Upstream model that served the request (LLM hop only) | +| `X-Kong-Upstream-Latency` | Time (ms) Kong spent waiting for the provider | +| `X-Kong-Proxy-Latency` | Time (ms) Kong spent processing the request | Create the demo script: @@ -1914,7 +1878,7 @@ Per-hop timings in Langfuse should be within a few tens of milliseconds of the t ### Explore in Konnect -Sign in to [{{site.konnect_product_name}}](https://cloud.konghq.com/) and navigate to **API Gateway** → **Gateways** → `voice-ai-observability-recipe`. From there: +Sign in to [Kong Konnect](https://cloud.konghq.com/) and navigate to **API Gateway** → **Gateways** → `voice-ai-observability-recipe`. From there: - Open the **Gateway services** tab to see the three Services (`voice-ai-stt`, `voice-ai-llm`, `voice-ai-tts`) and click into each to inspect their Routes (`/voice-ai-observability/stt`, `/voice-ai-observability/llm`, `/voice-ai-observability/tts`). - Open the **Plugins** tab to confirm the global Key Auth and OpenTelemetry Plugins, plus the three per-Service AI Proxy Advanced instances. @@ -1937,7 +1901,7 @@ Sign in to [{{site.konnect_product_name}}](https://cloud.konghq.com/) and naviga **Replace STT or TTS providers.** Update the STT or TTS Service target to use a different provider without changing the LLM configuration or the observability pipeline. Switch from OpenAI Whisper to a self-hosted speech model by changing `model.provider` and `model.options.upstream_url` on the STT target. The `gen_ai.*` span attributes and Prometheus labels update automatically to reflect the new provider. -**Add Prometheus metrics dashboards.** Kong emits AI-specific Prometheus metrics (`ai_llm_requests_total`, `ai_llm_cost_total`, `ai_llm_tokens_total`, `ai_llm_provider_latency`) with a `request_mode` label that distinguishes `oneshot`, `stream`, and `realtime` traffic. Import the [{{site.ai_gateway_name}} Grafana dashboard](https://grafana.com/grafana/dashboards/21162-kong-cx-ai/) for pre-built cost, latency, and throughput panels across all three pipeline hops. +**Add Prometheus metrics dashboards.** Kong emits AI-specific Prometheus metrics (`ai_llm_requests_total`, `ai_llm_cost_total`, `ai_llm_tokens_total`, `ai_llm_provider_latency`) with a `request_mode` label that distinguishes `oneshot`, `stream`, and `realtime` traffic. Import the [Kong AI Gateway Grafana dashboard](https://grafana.com/grafana/dashboards/21162-kong-cx-ai/) for pre-built cost, latency, and throughput panels across all three pipeline hops. **Explore realtime speech-to-speech.** For latency-sensitive applications where per-hop observability is less critical, the AI Proxy Advanced Plugin supports `route_type: realtime/v1/realtime` with `genai_category: realtime/generation` for OpenAI Realtime and Gemini Live WebSocket connections. Realtime mode collapses the three-hop pipeline into a single persistent WebSocket, trading the per-hop waterfall view for significantly lower turn latency. Kong tracks realtime traffic with the `request_mode=realtime` Prometheus label.