diff --git a/docs/analytics/access-and-entitlements.md b/docs/analytics/access-and-entitlements.md new file mode 100644 index 00000000..20f77d5a --- /dev/null +++ b/docs/analytics/access-and-entitlements.md @@ -0,0 +1,71 @@ +--- +title: "Access and Entitlements" +sidebar_label: "Access & Entitlements" +--- + +## When to use this + +- Confirm whether your account can see advanced analytics. +- Find out how many days of history you have access to. +- Understand the trial banner or the demo mode link. + +## Plan requirements + +Advanced analytics must be enabled on your account. Without it, the Analytics +page shows an upsell view with a **Contact Sales** call-to-action and no charts. + +## Free trial + +New accounts with advanced analytics enabled get an automatic free trial. The +trial: + +- Runs for the same number of days as your account's retention window. +- Shows a banner across the top of the Analytics page: "You're on a {N}-day + preview of Advanced Analytics, {N} days left." +- Includes two call-to-actions: **View demo →** (loads the dashboard with sample + data) and **Contact sales**. + +Accounts on the legacy analytics version are not eligible for the trial. They +continue to use the previous experience. + +:::note + +The trial banner notes that the charts may look sparse if your account hasn't +yet generated much traffic. Use **View demo →** to see what a fully populated +dashboard looks like. + +::: + +## Data retention + +Each account has an analytics history window measured in days. The window +controls: + +- How far back you can scroll using the time-range picker. +- Which presets in the picker are available. Presets longer than your window are + locked with an **Upgrade for [preset]** tooltip. +- The maximum start and end values when you pick a custom range. + +If you need a longer window, contact your Zuplo account team. + +## Demo mode + +Append `?demo=true` to the Analytics URL, or click **View demo →** in the trial +banner, to switch into demo mode. In demo mode: + +- Charts and tables are populated with synthetic sample data. +- A persistent banner reads: "You're viewing the Advanced Analytics demo with + sample data. Your real analytics aren't shown here." + +Remove the `demo` parameter from the URL to return to your real data. + +## Scope: account vs project + +- **Account scope** aggregates across every project in the account. The Requests + tab adds **Project Name** and **Deployment Name** as breakdowns; click a + project name to drill into project scope. +- **Project scope** filters to a single project and adds an **Environment** + selector (Working Copy, Production, Preview, Other) in the top bar. + +See [Shared controls](./shared-controls.md) for how scope affects filters and +breakdowns. diff --git a/docs/analytics/overview.md b/docs/analytics/overview.md new file mode 100644 index 00000000..15ecab4f --- /dev/null +++ b/docs/analytics/overview.md @@ -0,0 +1,65 @@ +--- +title: "Analytics" +sidebar_label: "Overview" +--- + +Zuplo Analytics is the dashboard inside the Zuplo portal that shows how traffic +moves through your gateway: request volume, latency, errors, who's calling you, +and (when relevant) AI gateway and MCP gateway activity. It's the page you open +when something looks off in production, when you're auditing spend, or when +you're answering "is anyone actually using this endpoint?" + +## When to use this + +- Investigate a latency spike or error surge across all projects in your + account, or inside a single project. +- Identify which API consumers, AI agents, or upstream origins drive the most + traffic or errors. +- Track AI gateway token usage and cost, or MCP gateway and server activity. + +## How to access + +Open **Analytics** in the Zuplo portal sidebar. The page works at two scopes: + +- **Account scope**: aggregates across every project in your account. Open + [Account Analytics](https://portal.zuplo.com/+/account/analytics). +- **Project scope**: open a project, then click **Analytics**. Filters to one + project and adds an **Environment** selector. + +## What's in this section + +- [Access and entitlements](./access-and-entitlements.md): plans, free trial, + demo mode, retention. +- [Shared controls](./shared-controls.md): time range, filters, environment + selector, banners, URL state. +- Tabs: + - [Requests](./tabs/requests.md): overall traffic, latency, errors. + - [Origins](./tabs/origins.md): backend performance. + - [Consumers](./tabs/consumers.md): per-consumer breakdowns. + - [Agents](./tabs/agents.md): classified AI agent traffic. + - [AI Gateway](./tabs/ai-gateway.md): LLM request volume, tokens, cost. + - [MCP Gateway](./tabs/mcp-gateway.md): virtual server routing, capability + invocations, upstream health. + - [MCP Server](./tabs/mcp-server.md): tool calls, resources, prompts on + Zuplo-hosted MCP servers. +- Reference: + - [Metrics glossary](./reference/metrics-glossary.md): every KPI and + percentile defined once. + - [URL parameters](./reference/url-parameters.md): permalink reference. + +## Tab visibility + +You'll see a subset of tabs depending on your plan and project setup: + +| Tab | When it appears | +| ----------- | --------------------------------------------------------------------- | +| Requests | All accounts with advanced analytics enabled. | +| Origins | The project uses managed-edge origins. | +| Consumers | All accounts with advanced analytics enabled. | +| Agents | All accounts with advanced analytics enabled. | +| AI Gateway | The project type is **ai**. | +| MCP Gateway | The project type is **standard** and an MCP gateway is in use. | +| MCP Server | The project type is **standard** and the project hosts an MCP server. | + +If you don't see Analytics at all, your account likely doesn't have advanced +analytics enabled. See [Access and entitlements](./access-and-entitlements.md). diff --git a/docs/analytics/reference/metrics-glossary.md b/docs/analytics/reference/metrics-glossary.md new file mode 100644 index 00000000..031b4c75 --- /dev/null +++ b/docs/analytics/reference/metrics-glossary.md @@ -0,0 +1,92 @@ +--- +title: "Metrics Glossary" +sidebar_label: "Metrics Glossary" +--- + +This page defines every term used in the Analytics dashboards once. KPI tables +on tab pages link here for depth. + +## HTTP status classes + +| Class | Meaning | +| ----- | ----------------------------------------------------------------------------------------------- | +| 2xx | Success. | +| 3xx | Redirection. | +| 4xx | Client error. The caller sent something the gateway or backend rejected. | +| 5xx | Server error. The gateway, an upstream origin, or an MCP backend failed to fulfill the request. | + +## Error rates + +**Client error rate.** 4xx count divided by total requests in the window, +expressed as a percentage. + +**Server error rate.** 5xx count divided by total requests in the window. + +**Request-weighted average.** When aggregating a rate across many entities +(consumers, agents, origins), each entity's rate is weighted by its request +count. A consumer with 100,000 requests at a 1% error rate contributes more than +a consumer with 100 requests at a 50% error rate. Use the request-weighted +figure to answer "what does the average request experience look like?"; use a +simple unweighted average to answer "what does the average consumer experience +look like?" + +## Latency + +**Avg latency.** Arithmetic mean response time. Sensitive to outliers. + +**P50 (median) latency.** Half of requests completed within this time. + +**P95 latency.** 95% of requests completed within this time. The other 5% took +longer. P95 is the standard tail-latency metric. + +**P99 latency.** 99% of requests completed within this time. Useful for spotting +outlier behavior that P95 may smooth over. + +**Latency distribution histogram.** Bands at P10, P50, P90, P95, P99. Clicking a +band on the Requests tab filters to requests in that duration range. + +## Active edge instances + +Distinct gateway worker instances actively serving traffic in each interval. A +rough indicator of how widely your traffic is distributed. + +## Active sessions (MCP Server) + +Distinct MCP sessions, estimated using HyperLogLog. The figure is approximate +but monotonic within a single time window. Accurate enough for trend analysis, +not for exact session counting. + +## Failure origin + +Classifies an error by where it originated: + +| Origin | Meaning | +| -------- | ---------------------------------------------------------- | +| gateway | The Zuplo gateway returned the error. | +| upstream | A backend origin or MCP server returned the error. | +| client | The client sent something invalid that caused the failure. | + +## Outcome class + +Used on MCP Gateway events: + +| Class | Meaning | +| ----------------- | -------------------------------------------------------------------- | +| success | Event completed normally. | +| application_error | Event failed due to an application-layer issue (e.g. invalid input). | +| gateway_error | The gateway itself returned an error. | +| upstream_error | An upstream MCP server returned an error. | + +## Tokens (AI Gateway) + +| Type | Meaning | +| ---------- | --------------------------------------------------------- | +| Prompt | Tokens in the request the gateway forwarded to the model. | +| Completion | Tokens in the model's response. | +| Embedding | Tokens consumed by embedding requests. | + +## Estimated cost (AI Gateway) + +Computed from token usage × the model's published pricing. Does not include +discounts, credits, or provider-side rounding. Use it for trend analysis, not +invoice reconciliation. diff --git a/docs/analytics/reference/url-parameters.md b/docs/analytics/reference/url-parameters.md new file mode 100644 index 00000000..7634a371 --- /dev/null +++ b/docs/analytics/reference/url-parameters.md @@ -0,0 +1,66 @@ +--- +title: "URL Parameters" +sidebar_label: "URL Parameters" +--- + +Every Analytics control persists to the URL. Copy the address bar to share any +view. + +## When to use this + +- Build a permalink to a specific time window, filter set, or demo view. +- Embed an Analytics link in a runbook, postmortem, or dashboard. +- Understand what each query parameter does. + +## Parameters + +| Parameter | Example | Effect | +| -------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------- | +| `time` | `?time=7d` | Apply a preset. Values: `1h`, `6h`, `24h`, `3d`, `7d`, `14d`, `28d`, `60d`, `90d`. | +| `start`, `end` | `?start=2026-05-01T00:00:00Z&end=2026-05-15T00:00:00Z` | Custom range as ISO-8601 datetimes. Overrides `time` when both are present. | +| `filter` | `?filter=httpStatus:class:5xx` | Add a filter as `::`. Repeat the parameter for multiple filters. | +| `demo` | `?demo=true` | Demo mode (sample data instead of your real analytics). | +| `preview` | `?preview=1` | Legacy preview mode. | + +## Match modes for `filter` + +| Mode | Meaning | Example | +| ---------- | --------------------- | ---------------------------------- | +| equals | Exact match. | `filter=httpMethod:equals:POST` | +| contains | Substring match. | `filter=route:contains:/v1/users` | +| in | Comma-separated list. | `filter=httpStatus:in:500,502,503` | +| not | Negation of equals. | `filter=country:not:US` | +| class | HTTP status class. | `filter=httpStatus:class:5xx` | +| startsWith | String prefix. | `filter=route:startsWith:/v1/` | +| endsWith | String suffix. | `filter=route:endsWith:.json` | + +## Permalink examples + +Last 7 days of 5xx errors on a specific route: + +``` +?time=7d&filter=httpStatus:class:5xx&filter=route:startsWith:/v1/users +``` + +Custom range with two filters: + +``` +?start=2026-05-01T00:00:00Z&end=2026-05-08T00:00:00Z&filter=country:equals:US&filter=httpMethod:equals:POST +``` + +Open the demo: + +``` +?demo=true +``` + +## Sharing + +The recipient sees the same view, provided they have access to the project or +account. + +## See also + +- [Shared controls](../shared-controls.md): what each control does in the UI. +- [Metrics glossary](./metrics-glossary.md): definitions for the fields you can + filter on. diff --git a/docs/analytics/shared-controls.md b/docs/analytics/shared-controls.md new file mode 100644 index 00000000..645d1427 --- /dev/null +++ b/docs/analytics/shared-controls.md @@ -0,0 +1,122 @@ +--- +title: "Shared Controls" +sidebar_label: "Shared Controls" +--- + +Every Analytics tab uses the same set of controls at the top of the page: a time +range picker, a filter bar, and (at project scope) an environment selector. +State persists to the URL so you can share or bookmark any view. + +## When to use this + +- Narrow a tab to a time window, environment, or set of filter values. +- Build a shareable link to a specific view. +- Understand what each banner across the top of the page means. + +## Time range + +The time range picker controls every chart, table, and KPI on the active tab. + +**Presets.** Last 1h, 6h, 24h, 3d, 7d, 14d, 28d, 60d, 90d. + +**Custom range.** Use the datetime-local inputs for **Start** and **End**. Both +fields are clamped to your account's retention window. + +**Locked presets.** Presets longer than your retention window show an **Upgrade +for [preset]** tooltip. See +[Access and entitlements](./access-and-entitlements.md). + +## Filters + +Filters render as removable pills in a sticky bar at the top of the tab. Add a +filter from any breakdown table by clicking a value, or build one manually. + +**Match modes.** Each filter uses one of: + +| Mode | Meaning | +| ---------- | ----------------------------------- | +| equals | Exact match. | +| contains | Substring match. | +| in | Value is in a comma-separated list. | +| not | Negation of equals. | +| class | HTTP status class (e.g. `5xx`). | +| startsWith | String prefix. | +| endsWith | String suffix. | + +**Clearing.** Remove a single pill with its **×**, or click **Clear all +filters** to reset. + +**Disabled fields.** Some fields are grayed out on tabs where they don't apply. +For example, `originHost` is unavailable on Requests, Consumers, and Agents; +`userSub` is unavailable on Origins. + +## Environment selector + +The environment selector appears only at project scope. It's a dropdown grouped +as: + +- **Working Copy** +- **Production** +- **Preview** +- **Other** + +Each environment shows a request count next to its name. The active selection +appears as a blue pill in the top bar. + +## Account vs project scope + +See +[Access and entitlements](./access-and-entitlements.md#scope-account-vs-project) +for how scope affects available breakdowns and the environment selector. + +## URL state and permalinks + +Every control persists to the URL. To share a view, copy the address bar. +There's no separate share button. + +| Parameter | Example | Effect | +| -------------- | ------------------------------------------------------ | ------------------------------------------------------- | +| `time` | `?time=7d` | Apply a preset. | +| `start`, `end` | `?start=2026-05-01T00:00:00Z&end=2026-05-15T00:00:00Z` | Custom range. Overrides `time`. | +| `filter` | `?filter=httpStatus:class:5xx` | Add a filter. Repeat the parameter for multiple values. | +| `demo` | `?demo=true` | Demo mode (sample data). | +| `preview` | `?preview=1` | Legacy preview mode. | + +See [URL parameters](./reference/url-parameters.md) for the full reference. + +## Refresh + +A spinning loader appears in the sticky bar while data refetches, and a +semi-transparent **Updating…** overlay covers the content area. There's no +manual refresh button and no auto-refresh interval. Change a control to trigger +a refetch. + +## Banners + +Banners appear at the top of the page in this priority order: + +1. **Preview banner**: when `preview=1` is set. Indicates legacy preview mode. +2. **Demo banner**: when `demo=true` is set. Reminds you sample data is shown + instead of your real analytics. +3. **Trial banner**: for new accounts with advanced analytics. Shows days + remaining and offers **View demo →** and **Contact Sales**. + +## Loading and empty states + +Each tab uses a shape-aware skeleton while the first request is in flight. The +product analytics tabs (AI Gateway, MCP Gateway, MCP Server) suppress that +skeleton briefly to avoid flashing when data is already cached. Empty states on +those tabs include a short description and a "Read the … docs" link to the +relevant product section. + +## Status colors + +The same color palette is used across every chart that breaks down by HTTP +status class: + +| Class | Color | +| ----- | ----- | +| 2xx | Green | +| 3xx | Blue | +| 4xx | Amber | +| 5xx | Red | diff --git a/docs/analytics/tabs/agents.md b/docs/analytics/tabs/agents.md new file mode 100644 index 00000000..a1359dda --- /dev/null +++ b/docs/analytics/tabs/agents.md @@ -0,0 +1,88 @@ +--- +title: "Agents" +sidebar_label: "Agents" +--- + +The **Agents** tab isolates AI agent traffic: requests classified as coming from +ChatGPT, Claude.ai, Cursor, GPTBot, and similar clients. It's a focused view; +browsers, webhooks, and generic SDK callers are excluded. + +## When to use this + +- See which AI agents are calling your API and how much volume they generate. +- Catch agent-specific error patterns. For example, one agent that fails CORS or + returns 4xx more often than the others. +- Compare latency experience across agents. + +## Summary KPIs + +| Name | What it measures | +| ----------------- | ---------------------------------------------------------------------------- | +| **Requests** | Total agent-classified requests. Excludes browsers, webhooks, generic SDKs. | +| **Client Errors** | Request-weighted 4xx rate across agents. | +| **Server Errors** | Request-weighted 5xx rate. Secondary: count of agents with at least one 5xx. | +| **Agents** | Distinct classified agents seen in the window. | +| **Total Errors** | Combined 4xx + 5xx count. Secondary: agents affected. | + +## Charts + +**Request Volume.** Stacked bars by status class. Granularity is always hourly +on this tab. + +**Agent Error Rates.** 4xx and 5xx over time. _What to look for:_ divergence +between agents is the headline signal. If Cursor shows a 12% 4xx rate while +ChatGPT sits at 2%, the issue is almost certainly specific to how Cursor calls +your endpoint. + +**Agent Latency Over Time.** P50, P95, P99 lines. + +## Agent table + +| Column | Notes | +| --------------- | -------------------------------- | +| Agent | Classified agent name. | +| Requests | Count with an inline volume bar. | +| Client Errors % | 4xx percentage. | +| Server Errors % | 5xx percentage. | +| Avg / P95 / P99 | Latency percentiles. | +| 4xx sparkline | Inline trend over the window. | +| 5xx sparkline | Inline trend over the window. | + +Searchable and sortable on any column. Click a row to filter the tab to that +agent. **Show more** loads the next 50. + +## Classified agents + +The classifier currently recognizes: ChatGPT, Claude.ai, Cursor, Claude Code, +GPTBot, Perplexity, Cline, Continue, OpenAI SDK, Anthropic SDK, Google AI, +Common Crawl. The list expands over time. + +Unclassified traffic is excluded from the Agents tab. + +:::warning + +Agent charts use a dedicated hourly rollup. Filtering other tabs by agent isn't +supported. Use the Agents tab to drill into an individual agent. + +::: + +## Filters + +The filter bar applies. `originHost` is not applicable here. See +[Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**The Agents tab is empty.** Either no classified agents called your gateway in +the window, or your retention window doesn't yet include any agent traffic. Try +the demo with **View demo →** in the trial banner to see what a populated tab +looks like. See [Access and entitlements](../access-and-entitlements.md). + +**I see a known agent in my logs but not here.** The classifier is conservative; +it labels traffic that clearly matches a known agent fingerprint. Generic SDK +traffic that doesn't identify itself is excluded. If you believe an agent should +be classified, send the User-Agent string to your Zuplo contact. + +**An agent shows zero requests but appears in the table.** Filters on the rest +of the tab may be excluding its traffic for the current window. Clear filters to +verify. diff --git a/docs/analytics/tabs/ai-gateway.md b/docs/analytics/tabs/ai-gateway.md new file mode 100644 index 00000000..1ea93eb4 --- /dev/null +++ b/docs/analytics/tabs/ai-gateway.md @@ -0,0 +1,67 @@ +--- +title: "AI Gateway" +sidebar_label: "AI Gateway" +--- + +The **AI Gateway** tab shows LLM traffic flowing through Zuplo's AI Gateway: +request volume, token usage, estimated cost, model and provider distribution, +latency, cache effectiveness, and blocked-request reasons. It's visible when the +project type is **ai**. + +## When to use this + +- Audit AI spend by model or provider. +- Compare cache hit rate before and after enabling caching. +- Investigate why requests are being blocked by your guardrails. + +## Summary KPIs + +| Name | What it measures | +| ------------------ | ---------------------------------------------------------- | +| **Total Requests** | All AI gateway requests in the window. | +| **Total Tokens** | Sum across requests. Secondary: prompt / completion split. | +| **Estimated Cost** | Computed from model pricing × token usage. | +| **Median Latency** | P50 across all AI gateway requests. | + +## Charts + +**Request Time Series.** Three series in one chart: requests, tokens, and cost +over the window. + +**Model Usage.** Stacked bars by model with a sidebar legend showing top models +by share. Click a model in the legend to highlight it; the others fade. + +**Token Breakdown.** A donut split of prompt / completion / embedding tokens, +plus a time series of the same. + +**Provider Breakdown.** A donut and time series by provider, plus a +top-providers list. + +**Latency Distribution.** Histogram of P10, P50, P90, P95, P99. + +**Latency Over Time.** P50, P95, P99 lines. + +**Cache Hit Rate.** Hits vs misses over time, with a summary hit rate. _What to +look for:_ a stable hit rate above your target after enabling caching means +semantic caching is working as configured. + +**Blocked Requests.** Donut and time series by block reason type. Useful when +guardrails or quota policies are doing meaningful work. + +## Filters + +The filter bar applies. See [Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**The AI Gateway tab is empty.** No AI Gateway traffic has been recorded in the +selected window. Start proxying requests through the AI Gateway and the charts +populate automatically. + +**Estimated cost doesn't match my provider bill.** Estimated cost is computed +from token usage and published pricing. It excludes discounts and credits. See +[Metrics glossary](../reference/metrics-glossary.md#estimated-cost-ai-gateway). + +**Cache hit rate is 0%.** Either caching isn't enabled on the route, or every +request was unique enough that no entry matched. Check your AI Gateway cache +configuration. diff --git a/docs/analytics/tabs/consumers.md b/docs/analytics/tabs/consumers.md new file mode 100644 index 00000000..f982a90d --- /dev/null +++ b/docs/analytics/tabs/consumers.md @@ -0,0 +1,73 @@ +--- +title: "Consumers" +sidebar_label: "Consumers" +--- + +The **Consumers** tab breaks traffic down by API consumer: anyone calling your +gateway, whether authenticated or anonymous. Use it to see who your noisiest +callers are, who's hitting errors, and which consumers experience the slowest +latency. + +## When to use this + +- Find the top API consumers by request volume. +- Identify which consumer is responsible for a 4xx or 5xx surge. +- Compare latency experience across consumers (for example, paid vs free tier). + +## Summary KPIs + +| Name | What it measures | +| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | +| **Requests** | Total requests across all consumers in the window. | +| **Client Errors** | Request-weighted 4xx rate across consumers (high-traffic consumers count more). See [Metrics glossary](../reference/metrics-glossary.md). | +| **Server Errors** | Request-weighted 5xx rate. Secondary: count of consumers with at least one 5xx. | +| **Consumers** | Distinct consumers (authenticated plus anonymous). | +| **Total Errors** | Combined 4xx + 5xx count. Secondary: consumers affected. | + +## Charts + +**Request Volume.** Stacked bars by status class. The chart title updates to +reflect the active consumer filter so you can tell at a glance whether you're +looking at one consumer or all of them. + +**Consumer Error Rates.** 4xx and 5xx over time. _What to look for:_ a sustained +4xx rate from one consumer usually points to a broken integration on their side. + +**Consumer Latency Over Time.** P50, P95, P99 lines. + +## Consumer table + +| Column | Notes | +| --------------- | ------------------------------------------------------------------- | +| User | Consumer identity. Anonymous requests show **Anonymous · No auth**. | +| Requests | Count with an inline volume bar. | +| Client Errors % | 4xx percentage. | +| Server Errors % | 5xx percentage. | +| Avg / P95 / P99 | Latency percentiles. | +| 4xx sparkline | Inline trend over the window. | +| 5xx sparkline | Inline trend over the window. | + +The table is searchable and sortable on any column (default: requests +descending). Clicking a row filters the entire tab to that consumer. **Show +more** loads the next 50. + +## Filters + +The filter bar applies. `originHost` is not applicable on this tab. See +[Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**Everything is showing as Anonymous.** If your gateway isn't authenticating +requests, or your auth policy isn't attaching a consumer identity, every request +falls into the **Anonymous · No auth** bucket. Check your API key or JWT policy +configuration. + +**I clicked a row but the charts didn't change.** A row click adds a consumer +filter pill. If you don't see the pill in the sticky bar, your click landed on a +non-row element. Try clicking the user cell directly. + +**The 5xx rate here is higher than on Requests.** The Consumers KPI is +request-weighted across consumers, while the Requests KPI is a flat rate over +all requests. They diverge when high-error consumers are a small share of total +volume. See [Metrics glossary](../reference/metrics-glossary.md). diff --git a/docs/analytics/tabs/mcp-gateway.md b/docs/analytics/tabs/mcp-gateway.md new file mode 100644 index 00000000..5ee1ae9a --- /dev/null +++ b/docs/analytics/tabs/mcp-gateway.md @@ -0,0 +1,78 @@ +--- +title: "MCP Gateway" +sidebar_label: "MCP Gateway" +--- + +The **MCP Gateway** tab shows server-side traffic through Zuplo's MCP gateway: +OAuth flows, auth and policy decisions, virtual-server routing, capability +invocations, and upstream MCP server health. It's visible when the project type +is **standard** and an MCP gateway is in use. + +## When to use this + +- See which virtual servers and capabilities are being exercised, and by whom. +- Track auth and policy decision outcomes. +- Identify whether failures originate in the gateway, the upstream, or the + client. + +## MCP Gateway vs MCP Server + +This tab is about traffic _to_ an MCP fleet via Zuplo's gateway. If you're +looking for what happened _inside_ an MCP server you host on Zuplo (tool calls, +JSON-RPC methods), see the [MCP Server](./mcp-server.md) tab. Some accounts see +both tabs; some see only one. + +## Summary KPIs + +| Name | What it measures | +| ------------------- | ---------------------------------------------------------------------------------------------- | +| **Events** | Total MCP Gateway events in the window. | +| **Success Rate** | Share of events with outcome = success. Secondary: success / error split. | +| **p95 Latency** | Total P95. Secondary: gateway-vs-upstream split, useful for telling where time is being spent. | +| **Failure Origins** | Sum of gateway + upstream + client failure counts. | + +See [Metrics glossary](../reference/metrics-glossary.md) for the failure-origin +and outcome-class definitions. + +## Charts + +**Events Time Series.** Stacked by top event types. + +**Event Family Donut.** Distribution across families: `mcp_request`, +`capability_invocation`, `auth_event`, `upstream_request`, `policy_decision`, +`control_plane_audit`. + +**Latency Split.** Total, gateway, and upstream P50 / P95 / P99 over time. _What +to look for:_ a P95 driven entirely by the upstream slice points to a slow MCP +backend; a gateway-heavy P95 points to policy or auth overhead. + +## Breakdown tables + +| Table | Columns | +| -------------------- | ------------------------------------------------------- | +| Top Capabilities | Capability, Type, Calls, Errors (count + %), P95. | +| Top Virtual Servers | Virtual Server, Events, Errors. | +| Top Upstream Servers | Upstream, Events, Errors, P95 upstream latency. | +| Top Clients | Client, Kind (from the `initialize` handshake), Events. | +| MCP Methods | Method, Events. | +| Upstream Auth Modes | Auth Mode, Events. | +| Failure Origins | Origin layer (gateway / upstream / client), Errors. | +| Top Reason Codes | Class, Code, Events, Errors. | + +## Filters + +The filter bar applies. See [Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**The MCP Gateway tab is empty.** No MCP Gateway events have been recorded in +the selected window. Once a client connects and invokes a capability, the +dashboard populates. + +**I don't see this tab.** Visibility requires project type **standard** and an +MCP gateway in use. If you're hosting an MCP server on Zuplo instead, look for +the [MCP Server](./mcp-server.md) tab. + +**Errors show but Failure Origins is empty.** Failure origins are classified +server-side from event metadata. Events without a clear origin classification +are counted in Errors but not in any of the gateway / upstream / client buckets. diff --git a/docs/analytics/tabs/mcp-server.md b/docs/analytics/tabs/mcp-server.md new file mode 100644 index 00000000..06b7131c --- /dev/null +++ b/docs/analytics/tabs/mcp-server.md @@ -0,0 +1,73 @@ +--- +title: "MCP Server" +sidebar_label: "MCP Server" +--- + +The **MCP Server** tab shows what happens inside MCP servers hosted on Zuplo: +tool invocations, resource reads, prompt gets, JSON-RPC method usage, transport +mix, and per-tool latency. It's visible when the project type is **standard** +and the project hosts an MCP server. + +## When to use this + +- Find the slowest or most-called tools. +- See which transport (stdio, HTTP, SSE) and which clients dominate traffic. +- Investigate JSON-RPC error codes returned to clients. + +## MCP Server vs MCP Gateway + +This tab is about activity inside MCP servers you host on Zuplo. If you're +looking for the server-side picture of traffic flowing through Zuplo's MCP +gateway (auth, routing, upstream health), see the +[MCP Gateway](./mcp-gateway.md) tab. + +## Summary KPIs + +| Name | What it measures | +| ------------------- | ------------------------------------------------------------------------------------------------------------------------- | +| **Tool Calls** | Total tool invocations in the window. Secondary: resource reads and prompt gets. | +| **Active Sessions** | Distinct MCP sessions (approximate, estimated via HyperLogLog). See [Metrics glossary](../reference/metrics-glossary.md). | +| **Error Rate** | Share of tool calls returning an application, gateway, or upstream error. | +| **p95 Latency** | P95 across all tool calls. | + +## Charts + +**Calls Time Series.** Four series in one chart: tool calls, resource reads, +prompt gets, and session starts. + +**Three donuts in a row.** + +- **JSON-RPC Methods**: distribution across the methods clients invoke. +- **Transport**: stdio / http / sse split. +- **Clients**: top clients by name from the `initialize` handshake. + +**Latency Percentiles Card and Latency Time Series.** Summary card plus P50 / +P95 / P99 over time. + +## Tables + +**Top Tools.** Tool name, Calls, Errors (count + %), and P50 / P95 / P99 +latency. The fastest way to find a slow or noisy tool. + +**Three list panels.** + +- **Top Resources Read**: URI, optional name, reads. +- **Top Prompts**: prompt, gets. +- **JSON-RPC Error Codes**: label, count. + +## Filters + +The filter bar applies. See [Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**The MCP Server tab is empty.** No MCP Server traffic has been recorded in the +selected window. Invoke a tool from a client and the dashboard populates. + +**Active sessions count looks too round.** Active sessions are estimated with +HyperLogLog. Accurate at scale, but the figure is approximate and may not +exactly match a count of unique session IDs. + +**I don't see this tab.** Visibility requires project type **standard** and an +MCP server hosted by the project. If you're consuming an MCP fleet through +Zuplo's gateway instead, look for the [MCP Gateway](./mcp-gateway.md) tab. diff --git a/docs/analytics/tabs/origins.md b/docs/analytics/tabs/origins.md new file mode 100644 index 00000000..f6f115c6 --- /dev/null +++ b/docs/analytics/tabs/origins.md @@ -0,0 +1,82 @@ +--- +title: "Origins" +sidebar_label: "Origins" +--- + +The **Origins** tab shows backend performance: how each upstream host you proxy +to is performing in terms of volume, error rate, and latency. It's visible when +the project uses managed-edge origins. + +## When to use this + +- Identify which backend is slow or returning errors. +- Compare the latency contribution of DNS, TCP, TLS, and application time. +- Audit traffic distribution across direct origins and service tunnels. + +## Summary metrics + +The header strip shows totals derived from the time series: + +| Name | What it measures | +| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| Total requests | All requests served against any origin in the window. | +| 4xx rate | Client error rate across all origins. | +| 5xx rate | Server error rate across all origins. | +| Weighted avg latency | Origin response time weighted by request count, so high-traffic origins dominate. See [Metrics glossary](../reference/metrics-glossary.md). | + +## Charts + +**Backend Request Time Series.** Stacked bars by status class, aggregated across +origins by default. Apply a host filter to scope to one origin. + +**Backend Latency.** Average and P95 over time. _What to look for:_ a P95 climb +while the average stays flat usually points to a few slow origins or routes +inside an otherwise healthy fleet. + +**Backend Error Rate.** 4xx and 5xx rates over time. + +**Request Lifecycle.** Stacked time spent in each phase of an origin request: +**DNS time**, **TCP time**, **TLS time**, and **application time**. A high TLS +slice indicates handshake overhead; a high application slice indicates the +origin is slow. + +## Tables + +Two tables sit side by side in a 2-column grid. + +### Direct Origins + +| Column | Notes | +| --------------- | -------------------------------- | +| Host | The origin hostname. | +| Requests | Count with an inline volume bar. | +| Client Errors | 4xx percentage. | +| Server Errors | 5xx percentage. | +| Avg / P95 / P99 | Latency percentiles. | +| 4xx sparkline | Inline trend over the window. | +| 5xx sparkline | Inline trend over the window. | + +Clicking a row toggles a host filter. Click again to remove it. + +### Service Tunnels + +Same columns and behavior as Direct Origins, scoped to tunnel-routed origins. +The table is hidden when no tunnel traffic is present. + +## Filters + +The filter bar applies, with one exception: `userSub` is not applicable on this +tab. See [Shared controls](../shared-controls.md#filters). + +## Troubleshooting + +**The Origins tab isn't visible.** It appears only when the project uses +managed-edge origins. If your project routes traffic differently, the tab is +hidden. + +**Service Tunnels table is missing.** That table only renders when at least one +origin is reached over a service tunnel. + +**A 5xx spike on one origin doesn't match the Requests tab.** If you've filtered +the Requests tab to a different route or status class, totals won't match. Clear +filters or compare with the same filters applied on both tabs. diff --git a/docs/analytics/tabs/requests.md b/docs/analytics/tabs/requests.md new file mode 100644 index 00000000..44086418 --- /dev/null +++ b/docs/analytics/tabs/requests.md @@ -0,0 +1,96 @@ +--- +title: "Requests" +sidebar_label: "Requests" +--- + +The **Requests** tab is the default Analytics overview: every request through +your gateway in the selected time window, with charts and breakdowns for volume, +latency, and errors. + +## When to use this + +- Spot-check overall traffic and error rate across a project or the whole + account. +- Investigate a spike in 4xx or 5xx responses. +- Drill from a route, status code, or geographic breakdown into the underlying + requests. + +## Summary KPIs + +| Name | What it measures | When it's useful | +| ----------------- | ------------------------------------------------------------- | ----------------------------------------- | +| **Requests** | Total request count. Secondary value: successful (2xx) count. | Quick health check on volume and success. | +| **Client Errors** | 4xx rate (4xx ÷ total). Secondary value: raw 4xx count. | Spot bad-input or auth issues. | +| **Server Errors** | 5xx rate (5xx ÷ total). Secondary value: raw 5xx count. | Spot gateway or upstream failures. | +| **Avg Latency** | Mean response time. Secondary value: min to max. | Detect broad latency regressions. | +| **Consumers** | Distinct API consumers (authenticated + anonymous). | Gauge active audience. | + +See [Metrics glossary](../reference/metrics-glossary.md) for how rates and +percentiles are computed. + +## Charts + +**Request Time Series.** Stacked bars per interval, broken down by status class +(2xx / 3xx / 4xx / 5xx). Drag to select a region to zoom; the time range picker +updates to match. + +**Request Locations Map.** A world map with a heatmap of request volume by +location. Shown only when geolocation data is present. + +**Latency Over Time.** P50, P95, and P99 lines. _What to look for:_ a widening +gap between P50 and P95 typically signals a tail-latency problem affecting a +subset of requests. + +**Error Rate.** 4xx and 5xx rates plotted over time. + +**Latency Distribution.** A histogram of P10, P50, P90, P95, and P99 buckets. +Click a band to filter the rest of the tab to requests in that duration range. + +**Active Instances.** Distinct active edge instances over time. A rough +indicator of how widely your traffic is distributed across gateway workers. + +## Breakdowns + +Each breakdown shows the top 10 values by request count. Click **Show more** to +load the next 50. + +**Primary breakdowns:** + +- **HTTP Method** +- **HTTP Status** +- **Route Path** + +**Account scope only:** + +- **Project Name**: click to drill into project-scope analytics. +- **Deployment Name**: click to drill into a specific deployment. + +**Secondary breakdowns:** + +- **Country**, **City**, **Colo** +- **User Sub** +- **Client IP** +- **AS Organization** + +Clicking any value applies an `equals` filter for that field. + +## Filters + +The full filter bar applies. `originHost` is not applicable on this tab. See +[Shared controls](../shared-controls.md#filters) for match modes and the filter +pill UI. + +## Troubleshooting + +**The map is missing.** The Request Locations Map only renders when geolocation +data is present in the time window. Short windows for low-traffic projects may +not include any geolocated requests. + +**Show more doesn't load anything.** You may already be viewing every value for +that breakdown. Top-10 plus 50 covers up to 60 distinct values; beyond that, +narrow the time range or add a filter. + +**My charts look sparse.** If your account is new, the trial banner across the +top calls this out. Click **View demo →** in the banner to see what a fully +populated dashboard looks like. See +[Access and entitlements](../access-and-entitlements.md). diff --git a/sidebar.ts b/sidebar.ts index 4d2a0311..85b47173 100644 --- a/sidebar.ts +++ b/sidebar.ts @@ -669,6 +669,37 @@ export const documentation: Navigation = [ "articles/rename-or-move-project", ], }, + { + type: "category", + label: "Analytics", + icon: "chart-line", + items: [ + "analytics/overview", + "analytics/access-and-entitlements", + "analytics/shared-controls", + { + type: "category", + label: "Tabs", + items: [ + "analytics/tabs/requests", + "analytics/tabs/origins", + "analytics/tabs/consumers", + "analytics/tabs/agents", + "analytics/tabs/ai-gateway", + "analytics/tabs/mcp-gateway", + "analytics/tabs/mcp-server", + ], + }, + { + type: "category", + label: "Reference", + items: [ + "analytics/reference/metrics-glossary", + "analytics/reference/url-parameters", + ], + }, + ], + }, { type: "category", label: "Observability",