From d9b1025ac9e2c7b6e4ec6596aa6068ec61e45c29 Mon Sep 17 00:00:00 2001 From: Alexey Kazakov Date: Fri, 8 May 2026 15:09:02 -0700 Subject: [PATCH 1/2] docs: add MCP server support design proposal - Design document for operator-managed MCP server configuration - Open questions document covering secret handling, transport types, and API shape decisions Signed-off-by: Alexey Kazakov Co-authored-by: Cursor --- docs/proposals/mcp-support-design.md | 201 +++++++++++++++++ docs/proposals/mcp-support-questions.md | 274 ++++++++++++++++++++++++ 2 files changed, 475 insertions(+) create mode 100644 docs/proposals/mcp-support-design.md create mode 100644 docs/proposals/mcp-support-questions.md diff --git a/docs/proposals/mcp-support-design.md b/docs/proposals/mcp-support-design.md new file mode 100644 index 0000000..43b95c4 --- /dev/null +++ b/docs/proposals/mcp-support-design.md @@ -0,0 +1,201 @@ +# MCP Server Support + +**Status:** Draft — pending decisions from [mcp-support-questions.md](mcp-support-questions.md) + +## Overview + +Add operator-managed MCP (Model Context Protocol) server configuration to the Claw CRD. MCP servers extend the AI assistant's capabilities by providing structured tool access to external services (GitHub API, filesystems, databases, etc.). Today, users must manually configure MCP servers via `openclaw config patch` inside the pod — there is no declarative, CR-driven way to manage them. + +This feature enables users to declare MCP servers in the Claw CR with proper secret management, and the operator injects the configuration into `operator.json` at reconciliation time. + +## Design Principles + +1. **Security first:** Real secrets must not reach the gateway container whenever possible. The MITM proxy pattern is the preferred path for HTTP-based MCP servers. For stdio MCP servers that require secrets as environment variables, the gateway container receives them directly — this is the "unless it's the only way" exception, documented explicitly. + +2. **Declarative and reconcilable:** MCP server configuration is part of the Claw CR spec, reconciled into `operator.json` the same way providers and channels are today. + +3. **Minimal API surface:** Follow existing patterns (channels, providers) rather than inventing new ones. Reuse `secretRef` for credentials. + +4. **User config preserved:** In `merge` mode, operator-managed MCP servers merge alongside user-managed servers added via `openclaw config patch` or `openclaw mcp set`. Operator-managed servers win on collision (same server name). + +## Architecture + +### How MCP Servers Work in OpenClaw + +OpenClaw supports two MCP transport types: + +- **Stdio**: A child process spawned by the gateway (`command` + `args`). Environment variables are passed to the child process. This is the most common type for local tool servers. +- **HTTP** (`streamable-http` or `sse`): An outbound HTTP connection to a remote URL. Headers can be passed. No subprocess. + +OpenClaw configuration path: `mcp.servers` in `openclaw.json`: + +```json +{ + "mcp": { + "servers": { + "github": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-github"], + "env": { + "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_real_token" + } + }, + "context7": { + "url": "https://mcp.context7.com/mcp", + "transport": "streamable-http" + } + } + } +} +``` + +### Operator Flow + +``` +Claw CR spec.mcpServers + │ + ▼ + resolveAndApplyCredentials() ◄── validate secrets exist (stdio env) + │ + ▼ + enrichConfigAndNetworkPolicy() + │ + ├─► injectMcpServersIntoConfigMap() ──► operator.json { mcp.servers } + │ + └─► proxy domain handling (HTTP MCP URLs) + │ + ▼ + configureGatewayForMcpServers() ──► env vars on gateway container (stdio secrets) + │ + ▼ + merge.js (init-config) ──► PVC openclaw.json (deep-merge preserves user MCP servers) +``` + +### Security Model + +> **Open question:** How should stdio MCP server secrets be handled? — see [questions document](mcp-support-questions.md), Q1. + +**HTTP MCP servers** with credentials in `headers`: Can be handled by the MITM proxy (inject headers on matching domain), keeping secrets off the gateway. The proxy already supports this pattern via `defaultHeaders` on credential routes. + +**Stdio MCP servers** with `env` secrets: The subprocess inherits env from the gateway process. There is no proxy interception for env vars — they must be on the gateway container. This is architecturally unavoidable for stdio MCP. + +> **Open question:** Should HTTP MCP server auth go through the proxy or be injected as config? — see [questions document](mcp-support-questions.md), Q2. + +### Network Access + +- **Stdio MCP servers** make their own outbound connections. Their traffic goes through `HTTP_PROXY`/`HTTPS_PROXY` (the MITM proxy). Domains they need must be in the proxy allowlist — either as builtin passthroughs, existing credentials, or new `type: none` domain entries. + +- **HTTP MCP servers** connect to a `url`. The gateway's HTTP client goes through the proxy. The MCP URL's domain must be allowed. + +> **Open question:** How should proxy domain allowlisting work for MCP servers? — see [questions document](mcp-support-questions.md), Q3. + +## Core Concepts + +### CRD Schema + +> **Open question:** Where should MCP servers live in the CRD? — see [questions document](mcp-support-questions.md), Q4. + +A new `spec.mcpServers` map on `ClawSpec`: + +```go +type ClawSpec struct { + ConfigMode ConfigMode `json:"configMode,omitempty"` + Credentials []CredentialSpec `json:"credentials,omitempty"` + McpServers map[string]McpServerSpec `json:"mcpServers,omitempty"` +} +``` + +Each `McpServerSpec` supports both stdio and HTTP transports: + +```go +type McpServerSpec struct { + // Stdio transport + Command string `json:"command,omitempty"` + Args []string `json:"args,omitempty"` + + // HTTP transport + URL string `json:"url,omitempty"` + Transport string `json:"transport,omitempty"` // "streamable-http" or "sse" + + // Env vars for stdio (plain values) + Env map[string]string `json:"env,omitempty"` + + // Secret-backed env vars for stdio + EnvFrom []McpEnvFromSecret `json:"envFrom,omitempty"` + + // Additional domains the MCP server needs (added to proxy allowlist) + AllowedDomains []string `json:"allowedDomains,omitempty"` +} + +type McpEnvFromSecret struct { + Name string `json:"name"` // env var name + SecretRef SecretRefEntry `json:"secretRef"` // k8s secret reference +} +``` + +> **Open question:** Should `env` and `envFrom` be separate fields or unified? — see [questions document](mcp-support-questions.md), Q5. + +### ConfigMap Injection + +A new `injectMcpServersIntoConfigMap()` function (following the channels/providers pattern) that: + +1. Builds `mcp.servers` map from `spec.mcpServers` +2. For `envFrom` entries, resolves to placeholder values (the real secrets go on the gateway container, not in the ConfigMap) +3. Merges into `operator.json` under `mcp.servers` + +### Gateway Deployment Modification + +A new `configureGatewayForMcpServers()` function (following the Vertex/Kubernetes pattern in `claw_deployment.go`) that: + +1. For each `McpServerSpec` with `envFrom`, adds env vars to the gateway container sourced from `secretKeyRef` +2. The env var name matches what's in the MCP server's `env` config so the subprocess inherits it + +### Proxy Configuration + +For HTTP MCP servers, the URL's domain must be in the proxy allowlist. Options: + +- Automatic: parse the URL, add as a `type: none` passthrough route +- Manual: user adds a `credentials` entry for the domain + +> **Open question:** Should this be automatic or manual? — see [questions document](mcp-support-questions.md), Q3. + +## Implementation Plan + +### Phase 1: CRD and ConfigMap Injection + +1. Add `McpServerSpec` types to `api/v1alpha1/claw_types.go` +2. Run `make manifests && make generate` +3. Add `injectMcpServersIntoConfigMap()` in new file `internal/controller/claw_mcp.go` +4. Wire into `enrichConfigAndNetworkPolicy()` in `claw_resource_controller.go` +5. Add unit tests in `claw_mcp_test.go` + +### Phase 2: Secret Handling (stdio env) + +1. Add `configureGatewayForMcpServers()` in `claw_deployment.go` +2. Add secret validation in `resolveAndApplyCredentials()` (or new MCP validation step) +3. Stamp secret versions for MCP secrets (rollout on secret change) +4. Add unit tests + +### Phase 3: Proxy Allowlisting (HTTP MCP) + +1. Parse HTTP MCP URLs, generate proxy routes +2. Add companion domain support via `allowedDomains` +3. Unit tests for proxy config generation with MCP domains + +### Phase 4: Documentation and Platform Skill + +1. Update PLATFORM.md with MCP server guidance +2. Add MCP section to `docs/provider-setup.md` +3. Add `McpServersConfigured` status condition + +## Open Questions + +Summary of unresolved decisions — see [mcp-support-questions.md](mcp-support-questions.md) for details: + +- **Q1:** How should stdio MCP server secrets be handled? +- **Q2:** Should HTTP MCP auth go through the proxy or be config-injected? +- **Q3:** How should proxy domain allowlisting work for MCP servers? +- **Q4:** Where should MCP config live in the CRD? +- **Q5:** Should `env` and `envFrom` be separate or unified? +- **Q6:** Should the operator validate MCP server configurations? +- **Q7:** How should the status condition work for MCP? diff --git a/docs/proposals/mcp-support-questions.md b/docs/proposals/mcp-support-questions.md new file mode 100644 index 0000000..c704a62 --- /dev/null +++ b/docs/proposals/mcp-support-questions.md @@ -0,0 +1,274 @@ +# MCP Support — Design Questions + +**Status:** Open — decisions pending +**Related:** [Design document](mcp-support-design.md) + +Each question has options with trade-offs and a recommendation. Go through them one by one to form the design, then update the design document. + +## Q1: How should stdio MCP server secrets be handled? + +Stdio MCP servers run as subprocesses of the gateway. They inherit the gateway process environment. There is no HTTP layer to intercept — the proxy cannot inject secrets for these. The fundamental question: do we accept putting secrets on the gateway container for stdio MCP, or do we limit support to avoid it? + +### Option A: Gateway env vars via `secretKeyRef` (accept the exception) + +Secrets are mounted as env vars on the gateway container, referenced from Kubernetes Secrets. The MCP server config in `operator.json` uses placeholder values; the real values come from the process environment, which the subprocess inherits. + +- **Pro:** Full stdio MCP support — works with all MCP servers (GitHub, filesystem, database, etc.) +- **Pro:** Follows the same pattern as `OPENCLAW_GATEWAY_TOKEN` (already a `secretKeyRef` on the gateway) +- **Pro:** Matches how the upstream operator and installer handle MCP secrets +- **Con:** Breaks the "no real secrets on the gateway" principle for these specific env vars +- **Con:** A compromised gateway process can read these env vars + +### Option B: No secret support for stdio MCP — only plain env + +Only allow plain-text `env` values (non-secret). Users who need secrets must manually configure them via `openclaw config patch` inside the pod or use HTTP MCP servers instead. + +- **Pro:** Maintains strict secret isolation +- **Con:** Most useful stdio MCP servers need secrets (GitHub PAT, database passwords) +- **Con:** Poor UX — user has to manually inject secrets after every pod restart +- **Con:** The upstream operator doesn't have this limitation + +### Option C: Sidecar proxy for stdio MCP servers + +Run stdio MCP servers in a separate sidecar container with secrets, communicating with the gateway over a local socket or HTTP bridge. + +- **Pro:** Secrets stay off the gateway container +- **Con:** Massive complexity — MCP stdio protocol uses stdin/stdout pipes, not sockets +- **Con:** Would require a custom bridge component that doesn't exist +- **Con:** Fundamentally changes the MCP execution model + +**Recommendation:** Option A. The gateway already has `secretKeyRef`-based secrets (`OPENCLAW_GATEWAY_TOKEN`, proxy CA). Stdio MCP is architecturally incapable of proxy-based injection. This is the documented "unless it's the only way" exception. The blast radius is limited: these env vars are only visible to the gateway process and its children, not externally accessible. + +## Q2: Should HTTP MCP auth go through the proxy or be config-injected? + +HTTP MCP servers (`url`-based, using `streamable-http` or `sse` transport) can have auth headers. The question is whether those headers should be injected by the MITM proxy (keeping secrets off the gateway) or written into the MCP config in `operator.json`. + +### Option A: Proxy-injected headers (reuse credential system) + +User adds both an MCP server entry (for config) and a `credentials` entry (for auth). The credential's `domain` matches the MCP URL's host, and the proxy injects auth headers. + +- **Pro:** Consistent with the operator's security model — real secrets stay on the proxy +- **Pro:** Reuses existing credential infrastructure (no new secret plumbing) +- **Pro:** HTTP MCP servers behave like any other authenticated HTTP endpoint +- **Con:** User must configure two things (MCP server + credential) for one service +- **Con:** Slightly more complex UX for simple cases + +### Option B: Config-injected headers (secrets in `operator.json`) + +Auth headers are written directly into the MCP server's `headers` field in `operator.json`, sourced from Kubernetes Secrets. + +- **Pro:** Single configuration point — everything about the MCP server is in one place +- **Con:** Secrets end up in `operator.json` on the ConfigMap, readable by the gateway +- **Con:** Breaks the proxy-based secret isolation for HTTP traffic (where it IS possible) +- **Con:** ConfigMap data is not encrypted at rest (unlike Secrets) + +### Option C: Hybrid — proxy for auth, config for non-secret headers + +MCP server config has non-secret headers (like `Content-Type`). Auth goes through a credential entry. The operator documents the pattern. + +- **Pro:** Best of both worlds — secrets stay secure, non-secret config stays simple +- **Con:** Same two-thing UX concern as Option A, though documented patterns mitigate this + +**Recommendation:** Option A. HTTP traffic already goes through the MITM proxy. The proxy can handle auth injection for MCP URLs exactly like it does for LLM providers. Users already know the credential pattern. The PLATFORM.md skill can guide Claw to help users set up both pieces together. + +## Q3: How should proxy domain allowlisting work for MCP servers? + +MCP servers need network access. Stdio MCP servers make outbound HTTP calls (through the proxy). HTTP MCP servers connect to their URL (through the proxy). Domains must be in the proxy allowlist. + +### Option A: Automatic from URL + explicit `allowedDomains` + +For HTTP MCP servers, auto-extract the domain from `url` and add it as a passthrough route. For stdio MCP servers (and any extra domains HTTP servers need), use an `allowedDomains` field on the MCP spec. + +- **Pro:** HTTP MCP "just works" — URL implies the domain +- **Pro:** `allowedDomains` covers stdio MCP servers that need specific endpoints +- **Con:** Auto-extraction adds implicit behavior (domain allowlisted without explicit credential entry) +- **Con:** `allowedDomains` is a new concept alongside `credentials` + +### Option B: Manual via existing `credentials` entries + +Users add `type: none` credential entries for any domains MCP servers need. No new fields on the MCP spec. + +- **Pro:** Single mechanism for all domain allowlisting (existing pattern) +- **Pro:** No new API surface +- **Con:** Verbose — user must add separate credential entries for every MCP domain +- **Con:** Poor discoverability — not obvious that MCP needs credential entries for domains + +### Option C: Automatic from URL only (no `allowedDomains` field) + +Auto-extract domain from HTTP MCP URLs. Stdio MCP server domains must be allowlisted via `credentials` entries. + +- **Pro:** Simpler API — no `allowedDomains` field +- **Pro:** HTTP MCP works automatically +- **Con:** Stdio MCP servers that need network access require separate credential entries +- **Con:** Inconsistent — HTTP auto-allowlisted, stdio not + +**Recommendation:** Option C. Keep it simple for Phase 1. HTTP MCP URLs naturally imply a domain — auto-extracting it is pragmatic and low-risk (it's a `type: none` passthrough, not credential injection). Stdio MCP server domains are less predictable (they depend on what the subprocess does) and are better handled explicitly via `credentials`. This avoids a new `allowedDomains` concept. We can always add it later if the UX is painful. + +## Q4: Where should MCP config live in the CRD? + +The operator currently has `spec.configMode` and `spec.credentials`. MCP is a new top-level concept. Where does it go? + +### Option A: New `spec.mcpServers` map + +A new top-level field on `ClawSpec`, parallel to `credentials`: + +```yaml +spec: + credentials: [...] + mcpServers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + envFrom: + - name: GITHUB_PERSONAL_ACCESS_TOKEN + secretRef: + name: github-pat-secret + key: token +``` + +- **Pro:** Clean separation — MCP is not a "credential" or "channel" +- **Pro:** Map keyed by server name (natural for MCP where names matter) +- **Pro:** Room for MCP-specific fields without overloading `CredentialSpec` +- **Con:** Adds a new top-level spec field (API surface growth) + +### Option B: Extend `CredentialSpec` with an `mcp` field + +Add MCP config as another variant of `CredentialSpec`, similar to how `channel` works: + +```yaml +spec: + credentials: + - name: github-mcp + type: none + mcp: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] +``` + +- **Pro:** Reuses existing credential infrastructure +- **Pro:** Secret handling via `secretRef` is already built in +- **Con:** `CredentialSpec` is already complex (7 types, channels, providers). MCP would overload it further +- **Con:** MCP servers don't fit the credential model — they're not about injecting auth on a domain +- **Con:** Forces MCP into credential validation logic that doesn't apply + +### Option C: Raw config pass-through (`spec.config.raw`) + +Follow the upstream operator pattern — users pass raw MCP JSON through an opaque config field: + +```yaml +spec: + config: + raw: + mcp: + servers: + github: { command: npx, args: [...] } +``` + +- **Pro:** Zero new types — just pass through to `openclaw.json` +- **Pro:** Matches upstream operator pattern exactly +- **Con:** No structured validation, no secret integration +- **Con:** Secrets would need to be plain text in the config (security problem) +- **Con:** No proxy domain auto-allowlisting +- **Con:** Doesn't leverage operator capabilities + +**Recommendation:** Option A. MCP servers are conceptually distinct from credentials and channels. A dedicated `spec.mcpServers` map gives clean separation, structured validation, and room for proper secret handling. It's also the most natural mapping to OpenClaw's `mcp.servers` config shape. + +## Q5: Should `env` and `envFrom` be separate fields or unified? + +Stdio MCP servers need environment variables — some are plain values, some come from Kubernetes Secrets. How should the API express this? + +### Option A: Separate `env` (plain) and `envFrom` (secret-backed) + +```yaml +mcpServers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + env: + LOG_LEVEL: debug + envFrom: + - name: GITHUB_PERSONAL_ACCESS_TOKEN + secretRef: + name: github-pat-secret + key: token +``` + +- **Pro:** Clear distinction between secret and non-secret values +- **Pro:** Plain `env` values go directly into `operator.json`; `envFrom` become `secretKeyRef` on the container and placeholders in config +- **Pro:** Follows Kubernetes naming conventions (`env` + `envFrom`) +- **Con:** Two fields for related concept + +### Option B: Unified `env` with inline secret references + +```yaml +mcpServers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + env: + LOG_LEVEL: debug + GITHUB_PERSONAL_ACCESS_TOKEN: + secretRef: + name: github-pat-secret + key: token +``` + +- **Pro:** Single field, mixed plain and secret values +- **Con:** Polymorphic value type (string | object) is ugly in Go CRD types and CEL validation +- **Con:** CRD schema can't easily express "value is either a string or an object with secretRef" +- **Con:** Harder to distinguish which env vars need container-level `secretKeyRef` mounting + +**Recommendation:** Option A. The Kubernetes naming convention is well-understood. The Go type system handles it cleanly. The separation makes the reconciler logic straightforward: `env` goes to ConfigMap, `envFrom` goes to both ConfigMap (as placeholder) and Deployment (as `secretKeyRef`). + +## Q6: Should the operator validate MCP server configurations? + +The operator could validate MCP server configs (e.g., stdio must have `command`, HTTP must have `url`, not both). Or it could pass them through and let OpenClaw validate at runtime. + +### Option A: Validate with CEL rules on the CRD + +Add `XValidation` rules like: "command and url are mutually exclusive", "transport requires url", etc. + +- **Pro:** Fast feedback — `oc apply` fails immediately with a clear message +- **Pro:** Consistent with how `CredentialSpec` validates (CEL rules on type/provider/channel) +- **Con:** More CEL rules to maintain +- **Con:** If OpenClaw adds new transport types, CRD validation could reject valid configs + +### Option B: Minimal validation only + +Only validate what the operator needs to function (e.g., `envFrom` secretRef exists). Let OpenClaw handle the rest. + +- **Pro:** Forward-compatible — new MCP features in OpenClaw don't require CRD updates +- **Pro:** Less maintenance +- **Con:** Misconfigurations surface at runtime, not apply time + +**Recommendation:** Option A. CEL validation is the project convention. Basic structural rules (stdio xor HTTP, required fields) catch common mistakes early. The rules can be relaxed later if needed; tightening is harder. + +## Q7: How should the status condition work for MCP? + +The operator uses conditions to report state: `CredentialsResolved`, `ProxyConfigured`, `Ready`. Should MCP have its own condition? + +### Option A: New `McpServersConfigured` condition + +Separate condition, set to `True` when all MCP server secrets are validated and config is injected. + +- **Pro:** Clear signal for MCP-specific issues +- **Pro:** Follows existing pattern (`CredentialsResolved`, `ProxyConfigured`) +- **Con:** More conditions to track + +### Option B: Fold into `CredentialsResolved` + +MCP secret validation runs alongside credential validation; failures show on `CredentialsResolved`. + +- **Pro:** Fewer conditions +- **Con:** Muddies the meaning of `CredentialsResolved` — MCP isn't a "credential" +- **Con:** Harder to diagnose MCP-specific issues + +### Option C: No new condition — only `Ready` + +MCP failures surface on the top-level `Ready` condition with a descriptive message. + +- **Pro:** Simplest +- **Con:** Less granular debugging +- **Con:** `Ready=False` doesn't indicate whether the issue is MCP, credentials, or something else + +**Recommendation:** Option A. A dedicated condition is cheap, follows the project pattern, and makes MCP issues immediately diagnosable. The operator already has 4 conditions — one more is fine. From 3e5a1a051bf666565a6c1e5b910bdf20196ed7a3 Mon Sep 17 00:00:00 2001 From: Alexey Kazakov Date: Fri, 8 May 2026 19:00:37 -0700 Subject: [PATCH 2/2] docs: finalize MCP support design with resolved decisions - Resolve all 7 design questions in the questions document - Add three-tier security model (HTTP proxy, stdio placeholder, envFrom escape hatch) - Detail CRD schema with CEL validation rules and Go type definitions - Map reconciliation flow to actual controller functions - Refactor implementation plan into 3 PR-aligned phases - Add full CR example demonstrating all three tiers Signed-off-by: Alexey Kazakov Co-authored-by: Cursor --- docs/proposals/mcp-support-design.md | 312 ++++++++++++++++-------- docs/proposals/mcp-support-questions.md | 257 +++++++------------ 2 files changed, 311 insertions(+), 258 deletions(-) diff --git a/docs/proposals/mcp-support-design.md b/docs/proposals/mcp-support-design.md index 43b95c4..98f1537 100644 --- a/docs/proposals/mcp-support-design.md +++ b/docs/proposals/mcp-support-design.md @@ -1,6 +1,6 @@ # MCP Server Support -**Status:** Draft — pending decisions from [mcp-support-questions.md](mcp-support-questions.md) +**Status:** Final — all decisions resolved in [mcp-support-questions.md](mcp-support-questions.md) ## Overview @@ -10,22 +10,36 @@ This feature enables users to declare MCP servers in the Claw CR with proper sec ## Design Principles -1. **Security first:** Real secrets must not reach the gateway container whenever possible. The MITM proxy pattern is the preferred path for HTTP-based MCP servers. For stdio MCP servers that require secrets as environment variables, the gateway container receives them directly — this is the "unless it's the only way" exception, documented explicitly. +1. **Security first:** Real secrets must not reach the gateway container whenever possible. The MITM proxy is the preferred credential injection path. HTTP/SSE MCP servers are recommended over stdio. For stdio, a proxy-placeholder pattern keeps secrets off the gateway in most cases. Direct gateway env var secrets (`envFrom`) are the last resort. 2. **Declarative and reconcilable:** MCP server configuration is part of the Claw CR spec, reconciled into `operator.json` the same way providers and channels are today. -3. **Minimal API surface:** Follow existing patterns (channels, providers) rather than inventing new ones. Reuse `secretRef` for credentials. +3. **Minimal API surface:** Follow existing patterns (channels, providers) rather than inventing new ones. Reuse `credentials` for domain allowlisting and auth injection. 4. **User config preserved:** In `merge` mode, operator-managed MCP servers merge alongside user-managed servers added via `openclaw config patch` or `openclaw mcp set`. Operator-managed servers win on collision (same server name). ## Architecture +### Three-Tier Security Model + +| Tier | Transport | Secret handling | Secrets on gateway? | +|---|---|---|---| +| **1. HTTP/SSE MCP** (preferred) | HTTP URL | Proxy `credentials` entry for the URL's domain | No | +| **2. Stdio + proxy placeholder** (recommended for stdio) | Subprocess | Placeholder env var + proxy `credentials` entry for known domains | No | +| **3. Stdio + real secret** (escape hatch) | Subprocess | `envFrom` with `secretKeyRef` on the gateway container | Yes | + +**Tier 1** — HTTP/SSE MCP servers are the recommended path. Traffic goes through the MITM proxy. The user adds a `credentials` entry for the MCP URL's domain. The proxy injects auth headers. No secrets on the gateway. + +**Tier 2** — Stdio MCP subprocesses inherit `HTTP_PROXY`/`HTTPS_PROXY`, so their outbound HTTPS calls go through the MITM proxy. The user sets the env var to a placeholder value and adds a `credentials` entry for the domain. The MCP server sends auth with the placeholder, the proxy strips and replaces it. The user needs to know which domain the MCP server calls and what auth type it uses. The platform skill documents this for well-known MCP servers. + +**Tier 3** — When the user doesn't know the MCP server's internals, or the MCP server uses the secret for non-HTTP purposes, `envFrom` mounts the real secret on the gateway container. This is an explicit opt-in — the user accepts the security tradeoff. + ### How MCP Servers Work in OpenClaw OpenClaw supports two MCP transport types: -- **Stdio**: A child process spawned by the gateway (`command` + `args`). Environment variables are passed to the child process. This is the most common type for local tool servers. -- **HTTP** (`streamable-http` or `sse`): An outbound HTTP connection to a remote URL. Headers can be passed. No subprocess. +- **Stdio**: A child process spawned by the gateway (`command` + `args`). Environment variables are passed to the child process. +- **HTTP** (`streamable-http` or `sse`): An outbound HTTP connection to a remote URL. OpenClaw configuration path: `mcp.servers` in `openclaw.json`: @@ -37,7 +51,7 @@ OpenClaw configuration path: `mcp.servers` in `openclaw.json`: "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { - "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_real_token" + "GITHUB_PERSONAL_ACCESS_TOKEN": "placeholder" } }, "context7": { @@ -51,151 +65,259 @@ OpenClaw configuration path: `mcp.servers` in `openclaw.json`: ### Operator Flow +Mapped to the actual reconciliation order in `claw_resource_controller.go`: + ``` Claw CR spec.mcpServers │ ▼ - resolveAndApplyCredentials() ◄── validate secrets exist (stdio env) + resolveAndApplyCredentials() + ├─► validateMcpServerSecrets() ◄── validate envFrom secrets exist + │ set McpServersConfigured condition │ ▼ - enrichConfigAndNetworkPolicy() + applyProxyResources() + ├─► generateProxyConfig() ◄── auto-extract HTTP MCP URL domains + │ as passthrough routes (alongside + │ credential routes and builtins) │ - ├─► injectMcpServersIntoConfigMap() ──► operator.json { mcp.servers } + ▼ + buildKustomizedObjects() ◄── load embedded Kustomize manifests │ - └─► proxy domain handling (HTTP MCP URLs) + ▼ + configureDeployments() + ├─► configureGatewayForMcpServers() ◄── env vars on gateway container + │ (tier 3 envFrom secrets) + │ + ▼ + stampMcpSecretVersionAnnotation() ◄── stamp gateway deployment pod + │ template (rollout on secret change) │ ▼ - configureGatewayForMcpServers() ──► env vars on gateway container (stdio secrets) + enrichConfigAndNetworkPolicy() + ├─► injectMcpServersIntoConfigMap() ◄── operator.json { mcp.servers } │ ▼ - merge.js (init-config) ──► PVC openclaw.json (deep-merge preserves user MCP servers) + merge.js (init-config at pod start) ◄── PVC openclaw.json (deep-merge + preserves user MCP servers) ``` -### Security Model - -> **Open question:** How should stdio MCP server secrets be handled? — see [questions document](mcp-support-questions.md), Q1. - -**HTTP MCP servers** with credentials in `headers`: Can be handled by the MITM proxy (inject headers on matching domain), keeping secrets off the gateway. The proxy already supports this pattern via `defaultHeaders` on credential routes. - -**Stdio MCP servers** with `env` secrets: The subprocess inherits env from the gateway process. There is no proxy interception for env vars — they must be on the gateway container. This is architecturally unavoidable for stdio MCP. - -> **Open question:** Should HTTP MCP server auth go through the proxy or be injected as config? — see [questions document](mcp-support-questions.md), Q2. +Note: `stampSecretVersionAnnotation` (existing) stamps the **proxy** deployment for credential secrets. MCP `envFrom` secrets need a separate `stampMcpSecretVersionAnnotation` that stamps the **gateway** deployment, since that's where the env vars are mounted. ### Network Access -- **Stdio MCP servers** make their own outbound connections. Their traffic goes through `HTTP_PROXY`/`HTTPS_PROXY` (the MITM proxy). Domains they need must be in the proxy allowlist — either as builtin passthroughs, existing credentials, or new `type: none` domain entries. - -- **HTTP MCP servers** connect to a `url`. The gateway's HTTP client goes through the proxy. The MCP URL's domain must be allowed. - -> **Open question:** How should proxy domain allowlisting work for MCP servers? — see [questions document](mcp-support-questions.md), Q3. - -## Core Concepts +- **HTTP MCP servers**: Domain auto-extracted from `url` and added as a `type: none` passthrough route in the proxy config. If the user also has a `credentials` entry for the same domain (for auth), the credential takes precedence. -### CRD Schema +- **Stdio MCP servers**: Users add `credentials` entries for domains the MCP server needs. For tier 2, these credentials also handle auth injection. -> **Open question:** Where should MCP servers live in the CRD? — see [questions document](mcp-support-questions.md), Q4. +## CRD Schema -A new `spec.mcpServers` map on `ClawSpec`: +New `spec.mcpServers` map on `ClawSpec`: ```go type ClawSpec struct { - ConfigMode ConfigMode `json:"configMode,omitempty"` - Credentials []CredentialSpec `json:"credentials,omitempty"` - McpServers map[string]McpServerSpec `json:"mcpServers,omitempty"` + ConfigMode ConfigMode `json:"configMode,omitempty"` + Credentials []CredentialSpec `json:"credentials,omitempty"` + McpServers map[string]McpServerSpec `json:"mcpServers,omitempty"` } ``` -Each `McpServerSpec` supports both stdio and HTTP transports: +Each `McpServerSpec`: ```go +// McpServerSpec defines an MCP server the operator injects into OpenClaw's config. +// +kubebuilder:validation:XValidation:rule="has(self.command) || has(self.url)",message="either command (stdio) or url (HTTP) must be set" +// +kubebuilder:validation:XValidation:rule="!has(self.command) || !has(self.url)",message="command and url are mutually exclusive" type McpServerSpec struct { - // Stdio transport - Command string `json:"command,omitempty"` - Args []string `json:"args,omitempty"` - - // HTTP transport - URL string `json:"url,omitempty"` - Transport string `json:"transport,omitempty"` // "streamable-http" or "sse" - - // Env vars for stdio (plain values) + // Command is the executable for a stdio MCP server. + // +optional + Command string `json:"command,omitempty"` + + // Args are command-line arguments for the stdio server. + // +optional + Args []string `json:"args,omitempty"` + + // URL is the endpoint for an HTTP MCP server. + // +optional + URL string `json:"url,omitempty"` + + // Transport selects the HTTP transport type ("streamable-http" or "sse"). + // Only valid when url is set. + // +optional + Transport string `json:"transport,omitempty"` + + // Env are plain environment variables passed to the stdio server process + // and written into the MCP server config in operator.json. + // Use for non-secret values and tier-2 placeholder tokens. + // +optional Env map[string]string `json:"env,omitempty"` - // Secret-backed env vars for stdio + // EnvFrom are secret-backed environment variables mounted on the gateway + // container and inherited by the stdio server subprocess (tier 3). + // Use only when the proxy-placeholder pattern (tier 2) is not viable. + // +optional EnvFrom []McpEnvFromSecret `json:"envFrom,omitempty"` - - // Additional domains the MCP server needs (added to proxy allowlist) - AllowedDomains []string `json:"allowedDomains,omitempty"` } +// McpEnvFromSecret maps a Kubernetes Secret key to an environment variable. type McpEnvFromSecret struct { - Name string `json:"name"` // env var name - SecretRef SecretRefEntry `json:"secretRef"` // k8s secret reference + // Name is the environment variable name. + // +kubebuilder:validation:MinLength=1 + Name string `json:"name"` + + // SecretRef references a key in a Kubernetes Secret. + SecretRef SecretRefEntry `json:"secretRef"` } ``` -> **Open question:** Should `env` and `envFrom` be separate fields or unified? — see [questions document](mcp-support-questions.md), Q5. +### CEL Validation -### ConfigMap Injection +Only structurally certain rules: +- `command` and `url` are mutually exclusive +- At least one of `command` or `url` must be set -A new `injectMcpServersIntoConfigMap()` function (following the channels/providers pattern) that: +No validation of `transport` values or `env` key naming — those are OpenClaw's concern. -1. Builds `mcp.servers` map from `spec.mcpServers` -2. For `envFrom` entries, resolves to placeholder values (the real secrets go on the gateway container, not in the ConfigMap) -3. Merges into `operator.json` under `mcp.servers` +### Reconciler Validation -### Gateway Deployment Modification +- `envFrom` referenced Secrets must exist and contain the specified key (same pattern as credential validation) +- Failures set `McpServersConfigured=False` with a descriptive message -A new `configureGatewayForMcpServers()` function (following the Vertex/Kubernetes pattern in `claw_deployment.go`) that: +## ConfigMap Injection -1. For each `McpServerSpec` with `envFrom`, adds env vars to the gateway container sourced from `secretKeyRef` -2. The env var name matches what's in the MCP server's `env` config so the subprocess inherits it +New `injectMcpServersIntoConfigMap()` function (following the channels/providers pattern): -### Proxy Configuration +1. For each `McpServerSpec`, builds the server config object: + - Stdio: `{ "command": ..., "args": ..., "env": { ... } }` — `env` includes plain values from `env` field. For `envFrom` entries, the env var name is included in the config with a placeholder value (the real value comes from the container environment at runtime). + - HTTP: `{ "url": ..., "transport": ... }` +2. Sets `config["mcp"]["servers"][name] = serverConfig` +3. Marshals back into `operator.json` -For HTTP MCP servers, the URL's domain must be in the proxy allowlist. Options: +## Gateway Deployment Modification -- Automatic: parse the URL, add as a `type: none` passthrough route -- Manual: user adds a `credentials` entry for the domain +New `configureGatewayForMcpServers()` function in `claw_deployment.go`, called from `configureDeployments()` alongside the existing Vertex/Kubernetes gateway modifications: -> **Open question:** Should this be automatic or manual? — see [questions document](mcp-support-questions.md), Q3. - -## Implementation Plan +- For each `McpServerSpec` with `envFrom` entries, adds env vars to the gateway container: + ```json + { + "name": "GITHUB_PERSONAL_ACCESS_TOKEN", + "valueFrom": { + "secretKeyRef": { + "name": "github-pat-secret", + "key": "token" + } + } + } + ``` -### Phase 1: CRD and ConfigMap Injection +New `stampMcpSecretVersionAnnotation()` — separate from the existing `stampSecretVersionAnnotation` which targets the **proxy** deployment. MCP secrets are on the **gateway** deployment, so this function stamps the gateway pod template with MCP secret `ResourceVersion`s to trigger rollouts when secrets change. -1. Add `McpServerSpec` types to `api/v1alpha1/claw_types.go` -2. Run `make manifests && make generate` -3. Add `injectMcpServersIntoConfigMap()` in new file `internal/controller/claw_mcp.go` -4. Wire into `enrichConfigAndNetworkPolicy()` in `claw_resource_controller.go` -5. Add unit tests in `claw_mcp_test.go` +## Proxy Configuration -### Phase 2: Secret Handling (stdio env) +HTTP MCP URL domain auto-extraction happens inside `generateProxyConfig()` (in `claw_proxy.go`), which runs during `applyProxyResources()` — before ConfigMap injection. The function receives the Claw instance's `McpServers` map (or a pre-extracted list of domains) and: -1. Add `configureGatewayForMcpServers()` in `claw_deployment.go` -2. Add secret validation in `resolveAndApplyCredentials()` (or new MCP validation step) -3. Stamp secret versions for MCP secrets (rollout on secret change) -4. Add unit tests +1. Parses each HTTP MCP URL to extract the hostname +2. Checks if the domain is already covered by a `credentials` entry or builtin passthrough +3. If not covered, adds a `type: none` passthrough route to the proxy config -### Phase 3: Proxy Allowlisting (HTTP MCP) +This ensures unauthenticated HTTP MCP servers "just work" without requiring a separate `credentials` entry. -1. Parse HTTP MCP URLs, generate proxy routes -2. Add companion domain support via `allowedDomains` -3. Unit tests for proxy config generation with MCP domains +## Status Condition -### Phase 4: Documentation and Platform Skill +New `McpServersConfigured` condition type: -1. Update PLATFORM.md with MCP server guidance -2. Add MCP section to `docs/provider-setup.md` -3. Add `McpServersConfigured` status condition +- `True` — all MCP server secrets validated and config injected +- `False` — secret validation failed (with descriptive message) +- Not set when `spec.mcpServers` is empty -## Open Questions +Failures also set `Ready=False`. -Summary of unresolved decisions — see [mcp-support-questions.md](mcp-support-questions.md) for details: +## Implementation Plan -- **Q1:** How should stdio MCP server secrets be handled? -- **Q2:** Should HTTP MCP auth go through the proxy or be config-injected? -- **Q3:** How should proxy domain allowlisting work for MCP servers? -- **Q4:** Where should MCP config live in the CRD? -- **Q5:** Should `env` and `envFrom` be separate or unified? -- **Q6:** Should the operator validate MCP server configurations? -- **Q7:** How should the status condition work for MCP? +Each phase corresponds to a single PR. Phases must be merged in order. + +### Phase 1: HTTP/SSE and basic stdio MCP support (tiers 1 & 2) + +Delivers working MCP for HTTP servers (auto-allowlisted) and stdio servers with static/placeholder env vars. No `envFrom` field yet — the CRD only advertises capabilities the reconciler supports. + +1. Add `McpServerSpec` type to `api/v1alpha1/claw_types.go` (**without** `EnvFrom` field) +2. Add `McpServers map[string]McpServerSpec` field to `ClawSpec` +3. Add `ConditionTypeMcpServersConfigured` constant +4. Add CEL validation rules (`command` xor `url`) +5. Run `make manifests && make generate` +6. Add `injectMcpServersIntoConfigMap()` in new file `internal/controller/claw_mcp.go` +7. Add HTTP MCP URL domain extraction in `generateProxyConfig()` (in `claw_proxy.go`) +8. Wire `injectMcpServersIntoConfigMap` into `enrichConfigAndNetworkPolicy()` +9. Set `McpServersConfigured` condition (`True` on success, not set when empty) +10. Add unit tests in `claw_mcp_test.go` and `claw_proxy_test.go` + +### Phase 2: Stdio MCP secret injection (tier 3 — escape hatch) + +Adds the `envFrom` field and reconciler logic to mount real secrets into the gateway container for stdio MCP servers that manage their own credentials. + +1. Add `McpEnvFromSecret` type and `EnvFrom []McpEnvFromSecret` field to `McpServerSpec` +2. Add CEL validation for `envFrom` entries +3. Run `make manifests && make generate` +4. Add MCP `envFrom` secret validation in `resolveAndApplyCredentials()` (or adjacent) +5. Add `configureGatewayForMcpServers()` in `claw_deployment.go`, called from `configureDeployments()` +6. Add `stampMcpSecretVersionAnnotation()` targeting the gateway deployment +7. Enhance `McpServersConfigured` condition to report secret validation failures +8. Add unit tests + +### Phase 3: MCP documentation + +Required part of the feature — not follow-up work. + +1. Update PLATFORM.md with comprehensive MCP section (three-tier model, worked examples for well-known MCP servers, guidance on which tier to recommend) +2. Add MCP section to `docs/provider-setup.md` (step-by-step setup for all three tiers) + +## Example: Full CR + +```yaml +apiVersion: claw.sandbox.redhat.com/v1alpha1 +kind: Claw +metadata: + name: instance +spec: + credentials: + # LLM provider + - name: anthropic + type: apiKey + secretRef: + - name: anthropic-api-key + key: api-key + provider: anthropic + + # GitHub API — enables tier 2 proxy placeholder for the GitHub MCP server + - name: github + type: bearer + domain: api.github.com + secretRef: + - name: github-pat-secret + key: token + + mcpServers: + # Tier 1: HTTP MCP — domain auto-extracted, no secrets needed + context7: + url: https://mcp.context7.com/mcp + transport: streamable-http + + # Tier 2: Stdio MCP — placeholder env, proxy injects real PAT + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + env: + GITHUB_PERSONAL_ACCESS_TOKEN: placeholder + + # Tier 3: Stdio MCP — real secret on gateway (escape hatch) + custom-db: + command: node + args: ["db-mcp-server.js"] + env: + DB_HOST: postgres.internal + envFrom: + - name: DB_PASSWORD + secretRef: + name: db-credentials + key: password +``` diff --git a/docs/proposals/mcp-support-questions.md b/docs/proposals/mcp-support-questions.md index c704a62..8a6f3fd 100644 --- a/docs/proposals/mcp-support-questions.md +++ b/docs/proposals/mcp-support-questions.md @@ -1,108 +1,112 @@ # MCP Support — Design Questions -**Status:** Open — decisions pending +**Status:** All decisions resolved **Related:** [Design document](mcp-support-design.md) Each question has options with trade-offs and a recommendation. Go through them one by one to form the design, then update the design document. -## Q1: How should stdio MCP server secrets be handled? +## Q1: How should MCP server secrets be handled? -Stdio MCP servers run as subprocesses of the gateway. They inherit the gateway process environment. There is no HTTP layer to intercept — the proxy cannot inject secrets for these. The fundamental question: do we accept putting secrets on the gateway container for stdio MCP, or do we limit support to avoid it? +MCP servers may need credentials (API tokens, passwords). The operator's core security principle is that real secrets stay on the proxy, not the gateway. The question: how does this apply to the two MCP transport types? -### Option A: Gateway env vars via `secretKeyRef` (accept the exception) +### Three-tier security model (chosen) -Secrets are mounted as env vars on the gateway container, referenced from Kubernetes Secrets. The MCP server config in `operator.json` uses placeholder values; the real values come from the process environment, which the subprocess inherits. +| Tier | Transport | Secret handling | Secrets on gateway? | +|---|---|---|---| +| **1. HTTP/SSE MCP** (preferred) | HTTP URL | Proxy `credentials` entry for the URL's domain | No | +| **2. Stdio + proxy placeholder** (recommended for stdio) | Subprocess | Placeholder env var + proxy `credentials` entry for known domains | No | +| **3. Stdio + real secret** (escape hatch) | Subprocess | `envFrom` with `secretKeyRef` on the gateway container | Yes | -- **Pro:** Full stdio MCP support — works with all MCP servers (GitHub, filesystem, database, etc.) -- **Pro:** Follows the same pattern as `OPENCLAW_GATEWAY_TOKEN` (already a `secretKeyRef` on the gateway) -- **Pro:** Matches how the upstream operator and installer handle MCP secrets -- **Con:** Breaks the "no real secrets on the gateway" principle for these specific env vars -- **Con:** A compromised gateway process can read these env vars +**Tier 1 — HTTP/SSE MCP servers:** Traffic goes through the MITM proxy. The user adds a `credentials` entry for the MCP URL's domain (e.g., `type: bearer` for `api.example.com`). The proxy injects auth headers. No secrets on the gateway. -### Option B: No secret support for stdio MCP — only plain env +**Tier 2 — Stdio with proxy placeholder:** Stdio MCP subprocesses inherit `HTTP_PROXY`/`HTTPS_PROXY`, so their outbound HTTPS calls go through the MITM proxy too. If the user knows which domain the MCP server talks to, they can set the env var to a **placeholder** value and add a `credentials` entry for the domain. The MCP server sends `Authorization: Bearer placeholder`, the proxy strips it and injects the real token. Example: -Only allow plain-text `env` values (non-secret). Users who need secrets must manually configure them via `openclaw config patch` inside the pod or use HTTP MCP servers instead. - -- **Pro:** Maintains strict secret isolation -- **Con:** Most useful stdio MCP servers need secrets (GitHub PAT, database passwords) -- **Con:** Poor UX — user has to manually inject secrets after every pod restart -- **Con:** The upstream operator doesn't have this limitation - -### Option C: Sidecar proxy for stdio MCP servers - -Run stdio MCP servers in a separate sidecar container with secrets, communicating with the gateway over a local socket or HTTP bridge. - -- **Pro:** Secrets stay off the gateway container -- **Con:** Massive complexity — MCP stdio protocol uses stdin/stdout pipes, not sockets -- **Con:** Would require a custom bridge component that doesn't exist -- **Con:** Fundamentally changes the MCP execution model - -**Recommendation:** Option A. The gateway already has `secretKeyRef`-based secrets (`OPENCLAW_GATEWAY_TOKEN`, proxy CA). Stdio MCP is architecturally incapable of proxy-based injection. This is the documented "unless it's the only way" exception. The blast radius is limited: these env vars are only visible to the gateway process and its children, not externally accessible. - -## Q2: Should HTTP MCP auth go through the proxy or be config-injected? +```yaml +spec: + credentials: + - name: github + type: bearer + domain: api.github.com + secretRef: + - name: github-pat-secret + key: token + mcpServers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + env: + GITHUB_PERSONAL_ACCESS_TOKEN: placeholder +``` -HTTP MCP servers (`url`-based, using `streamable-http` or `sse` transport) can have auth headers. The question is whether those headers should be injected by the MITM proxy (keeping secrets off the gateway) or written into the MCP config in `operator.json`. +The user needs to know: which domain the MCP server calls, and what auth type it uses. The platform skill documents this for well-known MCP servers. -### Option A: Proxy-injected headers (reuse credential system) +**Tier 3 — Stdio with real secret (escape hatch):** When the user doesn't know the MCP server's internals, or the MCP server uses the secret for non-HTTP purposes, `envFrom` mounts the real secret as a gateway container env var. This breaks the "no secrets on gateway" principle but is the only option in these cases. Follows the same pattern as `OPENCLAW_GATEWAY_TOKEN` (already a `secretKeyRef` on the gateway). -User adds both an MCP server entry (for config) and a `credentials` entry (for auth). The credential's `domain` matches the MCP URL's host, and the proxy injects auth headers. +- **Pro:** Full coverage — every MCP server can be supported at some tier +- **Pro:** Tiers 1 and 2 preserve the security model (no secrets on gateway) +- **Pro:** Tier 3 is explicit opt-in — user consciously accepts the tradeoff +- **Pro:** Tier 2 works for the most common stdio MCP servers (GitHub, Slack, etc.) whose APIs are well-documented -- **Pro:** Consistent with the operator's security model — real secrets stay on the proxy -- **Pro:** Reuses existing credential infrastructure (no new secret plumbing) -- **Pro:** HTTP MCP servers behave like any other authenticated HTTP endpoint -- **Con:** User must configure two things (MCP server + credential) for one service -- **Con:** Slightly more complex UX for simple cases +_Considered and rejected: No secret support for stdio (impractical: most useful stdio MCP servers need secrets). Sidecar proxy for stdio (massive complexity: MCP stdio uses stdin/stdout pipes, not sockets)._ -### Option B: Config-injected headers (secrets in `operator.json`) +**Decision:** Three-tier model. HTTP/SSE is the recommended path (tier 1). Stdio with proxy placeholder is the recommended stdio path (tier 2) — platform skill documents the pattern for well-known MCP servers. `envFrom` with real secrets on the gateway (tier 3) is the escape hatch for cases where tiers 1–2 don't work. Users must also add `credentials` entries to allowlist domains their stdio MCP server needs (applies to both tiers 2 and 3). -Auth headers are written directly into the MCP server's `headers` field in `operator.json`, sourced from Kubernetes Secrets. +**Documentation requirement:** The three-tier model and the proxy-placeholder pattern are non-obvious. Implementation must include: +- **PLATFORM.md skill update**: Comprehensive section teaching OpenClaw how to guide users through each tier, with worked examples for well-known MCP servers (GitHub, etc.). The skill must explain which tier to recommend based on the user's situation, how to identify domains/auth types for tier 2, and when to fall back to tier 3. +- **`docs/provider-setup.md` update**: Step-by-step setup instructions for MCP servers, with examples covering all three tiers. -- **Pro:** Single configuration point — everything about the MCP server is in one place -- **Con:** Secrets end up in `operator.json` on the ConfigMap, readable by the gateway -- **Con:** Breaks the proxy-based secret isolation for HTTP traffic (where it IS possible) -- **Con:** ConfigMap data is not encrypted at rest (unlike Secrets) +These doc updates are a required part of the implementation, not a follow-up. -### Option C: Hybrid — proxy for auth, config for non-secret headers +## Q2: Should HTTP MCP auth go through the proxy or be config-injected? -MCP server config has non-secret headers (like `Content-Type`). Auth goes through a credential entry. The operator documents the pattern. +**Resolved by Q1.** The three-tier model from Q1 establishes that HTTP MCP auth uses the proxy via `credentials` entries (tier 1). No separate decision needed. -- **Pro:** Best of both worlds — secrets stay secure, non-secret config stays simple -- **Con:** Same two-thing UX concern as Option A, though documented patterns mitigate this +**Decision:** Proxy-injected headers (Option A). User adds both an MCP server entry and a `credentials` entry for the domain. The proxy injects auth. Secrets stay off the gateway. This is consistent with the operator's security model and reuses existing credential infrastructure. -**Recommendation:** Option A. HTTP traffic already goes through the MITM proxy. The proxy can handle auth injection for MCP URLs exactly like it does for LLM providers. Users already know the credential pattern. The PLATFORM.md skill can guide Claw to help users set up both pieces together. +_Considered and rejected: Option B — config-injected headers (secrets in `operator.json` on a ConfigMap, breaks proxy isolation). Option C — hybrid (same outcome as A, unnecessary distinction)._ ## Q3: How should proxy domain allowlisting work for MCP servers? MCP servers need network access. Stdio MCP servers make outbound HTTP calls (through the proxy). HTTP MCP servers connect to their URL (through the proxy). Domains must be in the proxy allowlist. -### Option A: Automatic from URL + explicit `allowedDomains` - -For HTTP MCP servers, auto-extract the domain from `url` and add it as a passthrough route. For stdio MCP servers (and any extra domains HTTP servers need), use an `allowedDomains` field on the MCP spec. - -- **Pro:** HTTP MCP "just works" — URL implies the domain -- **Pro:** `allowedDomains` covers stdio MCP servers that need specific endpoints -- **Con:** Auto-extraction adds implicit behavior (domain allowlisted without explicit credential entry) -- **Con:** `allowedDomains` is a new concept alongside `credentials` +### Option C: Automatic from URL only (no `allowedDomains` field) -### Option B: Manual via existing `credentials` entries +Auto-extract domain from HTTP MCP URLs and add as a `type: none` passthrough route. Stdio MCP server domains are allowlisted via `credentials` entries (which the user already adds for tier 2/3 auth anyway per Q1). -Users add `type: none` credential entries for any domains MCP servers need. No new fields on the MCP spec. +- **Pro:** HTTP MCP "just works" — unauthenticated HTTP MCP servers (like Context7) need only the MCP entry, no separate credential +- **Pro:** Authenticated HTTP MCP servers already have a `credentials` entry (Q1 tier 1), so auto-extraction is redundant but harmless +- **Pro:** No new API surface — no `allowedDomains` field +- **Pro:** Stdio domain handling is consistent with Q1 — users add `credentials` entries for both auth and allowlisting -- **Pro:** Single mechanism for all domain allowlisting (existing pattern) -- **Pro:** No new API surface -- **Con:** Verbose — user must add separate credential entries for every MCP domain -- **Con:** Poor discoverability — not obvious that MCP needs credential entries for domains +_Considered and rejected: Option A — auto from URL + `allowedDomains` field (adds a new domain-allowlisting concept parallel to `credentials`, unnecessary complexity). Option B — manual via `credentials` entries only (forces a `type: none` credential for every unauthenticated HTTP MCP server, verbose for the common case)._ -### Option C: Automatic from URL only (no `allowedDomains` field) +**Decision:** Option C. HTTP MCP URLs auto-extract the domain as a passthrough. Stdio MCP domains use existing `credentials` entries. No new API surface. -Auto-extract domain from HTTP MCP URLs. Stdio MCP server domains must be allowlisted via `credentials` entries. +Example — unauthenticated HTTP MCP (Context7) + authenticated stdio MCP (GitHub): -- **Pro:** Simpler API — no `allowedDomains` field -- **Pro:** HTTP MCP works automatically -- **Con:** Stdio MCP servers that need network access require separate credential entries -- **Con:** Inconsistent — HTTP auto-allowlisted, stdio not +```yaml +spec: + credentials: + # Tier 2: proxy injects real PAT on api.github.com (also allowlists the domain) + - name: github + type: bearer + domain: api.github.com + secretRef: + - name: github-pat-secret + key: token + mcpServers: + # HTTP MCP — domain auto-extracted from URL, no credential needed + context7: + url: https://mcp.context7.com/mcp + transport: streamable-http -**Recommendation:** Option C. Keep it simple for Phase 1. HTTP MCP URLs naturally imply a domain — auto-extracting it is pragmatic and low-risk (it's a `type: none` passthrough, not credential injection). Stdio MCP server domains are less predictable (they depend on what the subprocess does) and are better handled explicitly via `credentials`. This avoids a new `allowedDomains` concept. We can always add it later if the UX is painful. + # Stdio MCP — placeholder env, proxy handles real auth on api.github.com + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] + env: + GITHUB_PERSONAL_ACCESS_TOKEN: placeholder +``` ## Q4: Where should MCP config live in the CRD? @@ -129,49 +133,11 @@ spec: - **Pro:** Clean separation — MCP is not a "credential" or "channel" - **Pro:** Map keyed by server name (natural for MCP where names matter) - **Pro:** Room for MCP-specific fields without overloading `CredentialSpec` -- **Con:** Adds a new top-level spec field (API surface growth) - -### Option B: Extend `CredentialSpec` with an `mcp` field - -Add MCP config as another variant of `CredentialSpec`, similar to how `channel` works: +- **Pro:** Maps directly to OpenClaw's `mcp.servers` config shape -```yaml -spec: - credentials: - - name: github-mcp - type: none - mcp: - command: npx - args: ["-y", "@modelcontextprotocol/server-github"] -``` - -- **Pro:** Reuses existing credential infrastructure -- **Pro:** Secret handling via `secretRef` is already built in -- **Con:** `CredentialSpec` is already complex (7 types, channels, providers). MCP would overload it further -- **Con:** MCP servers don't fit the credential model — they're not about injecting auth on a domain -- **Con:** Forces MCP into credential validation logic that doesn't apply - -### Option C: Raw config pass-through (`spec.config.raw`) - -Follow the upstream operator pattern — users pass raw MCP JSON through an opaque config field: - -```yaml -spec: - config: - raw: - mcp: - servers: - github: { command: npx, args: [...] } -``` - -- **Pro:** Zero new types — just pass through to `openclaw.json` -- **Pro:** Matches upstream operator pattern exactly -- **Con:** No structured validation, no secret integration -- **Con:** Secrets would need to be plain text in the config (security problem) -- **Con:** No proxy domain auto-allowlisting -- **Con:** Doesn't leverage operator capabilities +_Considered and rejected: Option B — extend `CredentialSpec` (already complex with 7 types + channels + providers; MCP doesn't fit the credential model). Option C — raw config pass-through like the upstream operator (no structured validation, no secret integration, secrets as plain text in config)._ -**Recommendation:** Option A. MCP servers are conceptually distinct from credentials and channels. A dedicated `spec.mcpServers` map gives clean separation, structured validation, and room for proper secret handling. It's also the most natural mapping to OpenClaw's `mcp.servers` config shape. +**Decision:** Option A. `spec.mcpServers` as a new top-level map. ## Q5: Should `env` and `envFrom` be separate fields or unified? @@ -196,52 +162,32 @@ mcpServers: - **Pro:** Clear distinction between secret and non-secret values - **Pro:** Plain `env` values go directly into `operator.json`; `envFrom` become `secretKeyRef` on the container and placeholders in config - **Pro:** Follows Kubernetes naming conventions (`env` + `envFrom`) -- **Con:** Two fields for related concept - -### Option B: Unified `env` with inline secret references - -```yaml -mcpServers: - github: - command: npx - args: ["-y", "@modelcontextprotocol/server-github"] - env: - LOG_LEVEL: debug - GITHUB_PERSONAL_ACCESS_TOKEN: - secretRef: - name: github-pat-secret - key: token -``` +- **Pro:** Maps cleanly to the three-tier model: `env` for tier 2 placeholders, `envFrom` for tier 3 real secrets -- **Pro:** Single field, mixed plain and secret values -- **Con:** Polymorphic value type (string | object) is ugly in Go CRD types and CEL validation -- **Con:** CRD schema can't easily express "value is either a string or an object with secretRef" -- **Con:** Harder to distinguish which env vars need container-level `secretKeyRef` mounting +_Considered and rejected: Option B — unified `env` with polymorphic values (string | object is ugly in Go CRD types, CRD schema can't express it cleanly, harder to distinguish which env vars need container-level `secretKeyRef` mounting)._ -**Recommendation:** Option A. The Kubernetes naming convention is well-understood. The Go type system handles it cleanly. The separation makes the reconciler logic straightforward: `env` goes to ConfigMap, `envFrom` goes to both ConfigMap (as placeholder) and Deployment (as `secretKeyRef`). +**Decision:** Option A. Separate `env` and `envFrom` fields. ## Q6: Should the operator validate MCP server configurations? The operator could validate MCP server configs (e.g., stdio must have `command`, HTTP must have `url`, not both). Or it could pass them through and let OpenClaw validate at runtime. -### Option A: Validate with CEL rules on the CRD - -Add `XValidation` rules like: "command and url are mutually exclusive", "transport requires url", etc. +### Decision: Validate only what we're 100% sure about -- **Pro:** Fast feedback — `oc apply` fails immediately with a clear message -- **Pro:** Consistent with how `CredentialSpec` validates (CEL rules on type/provider/channel) -- **Con:** More CEL rules to maintain -- **Con:** If OpenClaw adds new transport types, CRD validation could reject valid configs +CEL rules for things that are structurally certain: +- `command` and `url` are mutually exclusive (a server can't be both stdio and HTTP) +- At least one of `command` or `url` must be set (an empty server entry is always wrong) +- `envFrom` secret references must be well-formed (name + key required) -### Option B: Minimal validation only +Skip validation for things that could change in OpenClaw: +- Don't enforce `transport` values (`sse`, `streamable-http`) — OpenClaw could add new ones +- Don't enforce that `args` requires `command` — OpenClaw might evolve +- Don't validate `env` key naming — that's the MCP server's concern -Only validate what the operator needs to function (e.g., `envFrom` secretRef exists). Let OpenClaw handle the rest. +Reconciler-time validation (not CEL): +- `envFrom` referenced Secrets must exist and contain the specified key (same pattern as credential validation) -- **Pro:** Forward-compatible — new MCP features in OpenClaw don't require CRD updates -- **Pro:** Less maintenance -- **Con:** Misconfigurations surface at runtime, not apply time - -**Recommendation:** Option A. CEL validation is the project convention. Basic structural rules (stdio xor HTTP, required fields) catch common mistakes early. The rules can be relaxed later if needed; tightening is harder. +_Rationale: Tight validation that blocks valid configs is worse than loose validation that lets misconfigs through to runtime. OpenClaw gives clear errors for bad MCP config. We only gate on things that are structurally impossible to be correct._ ## Q7: How should the status condition work for MCP? @@ -253,22 +199,7 @@ Separate condition, set to `True` when all MCP server secrets are validated and - **Pro:** Clear signal for MCP-specific issues - **Pro:** Follows existing pattern (`CredentialsResolved`, `ProxyConfigured`) -- **Con:** More conditions to track - -### Option B: Fold into `CredentialsResolved` - -MCP secret validation runs alongside credential validation; failures show on `CredentialsResolved`. - -- **Pro:** Fewer conditions -- **Con:** Muddies the meaning of `CredentialsResolved` — MCP isn't a "credential" -- **Con:** Harder to diagnose MCP-specific issues - -### Option C: No new condition — only `Ready` - -MCP failures surface on the top-level `Ready` condition with a descriptive message. -- **Pro:** Simplest -- **Con:** Less granular debugging -- **Con:** `Ready=False` doesn't indicate whether the issue is MCP, credentials, or something else +_Considered and rejected: Option B — fold into `CredentialsResolved` (muddies the meaning, MCP isn't a "credential"). Option C — no new condition, only `Ready` (less granular, can't distinguish MCP failures from other issues)._ -**Recommendation:** Option A. A dedicated condition is cheap, follows the project pattern, and makes MCP issues immediately diagnosable. The operator already has 4 conditions — one more is fine. +**Decision:** Option A. New `McpServersConfigured` condition.