Skip to content

Single-connector multi-cluster: tiered tools: config for tool exposure #136

@BorisTyshkevich

Description

@BorisTyshkevich

Goal

Today, reaching N ClickHouse clusters from claude.ai means registering N MCP connectors — one per cluster, because claude.ai binds one connector to one URL and runs one OAuth flow per connector. We want a single MCP connector that fronts several (2–5, mostly 2–3) clusters that already share OAuth.

Since claude.ai can't wildcard URLs, the cluster identity has to travel in-band — either as a tool argument or baked into a tool. URL-path routing (/mcp/{cluster}, #132/#134) structurally can't do single-connector, so this is a new exposure mode alongside it, not a replacement.

Scope for this issue:

  • One deployment fronting clusters that already share issuer / audience / signing_secret (no auth consolidation in scope).
  • Fixed, small cluster set (2–5).
  • No distributed / cross-cluster queries — one cluster per call.
  • Writes are confirmed by the agent already, so no extra write-gating needed here.

What we already have

  • multicluster URL-path routing: Host: clickhouse-{cluster}.demo.svc.cluster.local template, cluster from URL, ClusterAllowlist.
  • Static tools: execute_query (read), write_query (write) — fixed input schema {query, settings, …}, cluster-independent.
  • Dynamic tools: discovered by view_regexp (reads) / table_regexp (writes); names are prefix + discovered. Input schema is view-derived — parameterized views ({id:UInt64} via parseViewParams) and write tables (system.columns). So the same regexp can match different views with different schemas on different clusters.
  • Lazy discovery (EnsureDynamicTools) + catalog cache keyed (bearer, cluster).

Proposed design: tiered tools: placement

Reuse the existing ToolDefinition struct everywhere; where the tools: block lives determines the cluster binding:

Placement Cluster binding cluster arg? Best for
server.tools the one configured CH none single-cluster / legacy (unchanged)
multicluster.tools chosen at call time, enum = section names added to all generic execute_query / write_query
multicluster.clusters[].tools fixed by the section none curated per-cluster tools
clickhouse:
  host: clickhouse-{cluster}.demo.svc.cluster.local   # template; default for every section

multicluster:
  enabled: true

  tools:                          # tier 2 — cluster arg auto-added (enum: [otel, antalya])
    - type: read
      name: execute_query         # → one tool: {query, settings, cluster}
    - type: write
      name: write_query           # → {query, limit, settings, cluster}

  clusters:
    - name: otel                  # tier 3 — cluster baked in, no arg
      tools:
        - type: read
          view_regexp: "^mcp_.*"
          prefix: "otel_"         # admin-authored → otel_<view>
    - name: antalya
      host: clickhouse-antalya.demo.svc.cluster.local   # explicit override when template doesn't fit
      tools:
        - type: write
          table_regexp: "^events_.*"
          prefix: "antalya_"
          mode: insert

Binding / placement rules

  • Generic, cluster-independent tools (execute_query/write_query, fixed schema) → tier 2. One def, cluster enum arg over the section names. Cluster count doesn't grow the tool list.
  • Regexp / dynamic tools (view-derived schema) → tier 3, bound to one cluster so the derived schema is unambiguous. They cannot be collapsed under a tier-2 cluster arg, because one regexp matches differently-shaped views per cluster.
  • Admin-authored prefix per section disambiguates discovered names (otel_, antalya_). This is the operator's choice, not the server auto-deriving prefixes from cluster names.

Host template

Keep {cluster} templating (clickhouse-{cluster}.demo.svc.cluster.local) as the default — works for single-cluster and as the multi-cluster default; a section may override host (and other sparse CH overrides: port, TLS, database) when it doesn't fit.

Reuse vs new

  • Reuse: ToolDefinition, regexp discovery, lazy discovery + (bearer, cluster) cache, all handlers.
  • New: cluster sections in config; the cross-cluster union that assembles one connector's tools/list from the per-(bearer, cluster) discovered sets. The section names also become the allowlist + the tier-2 cluster enum (making ClusterAllowlist redundant for this mode).

Drift & collisions

  • Drift (configured/regexp tool whose view/table is missing for this user on this cluster) → silently omit from the list.
  • Name collisions are runtime and per-user — final names only materialize at discovery, since regexp matches whatever exists for that bearer on that cluster. They surface at the cross-cluster union step. On collision: expose no tool with that name (drop all contenders — never silently route to the wrong table/cluster); log once per cache miss (with tier/cluster/source for each contender) so the admin can fix the prefix manually. Load-time validation still catches cheap static issues (two tier-2 names equal, malformed prefix, bad regexp), but final-name uniqueness is a runtime concern.

Decisions already made in discussion

  • Single connector, shared-OAuth clusters only; no auth consolidation.
  • Tiered tools: placement (above) chosen over: per-call cluster arg on everything, session-pinning (select_cluster), and server auto-prefixing — all rejected.
  • Regexp discovery kept as-is, now usable inside a section.
  • Homogeneous repetition across sections is acceptable; no templating, no tier-2 subset scoping.
  • Collision → drop all colliding names + log per cache miss; admin resolves manually.

Open questions for broader discussion

  1. Should this single-connector mode and the existing /mcp/{cluster} URL routing coexist indefinitely, or is one a migration target?
  2. Any concern with tools/list fan-out across all configured clusters on first request per bearer (one-time, then cached)? Behavior when a cluster is unreachable at that moment — omit its tools and retry on next cache cycle?
  3. Tool-count / context budget in claude.ai & ChatGPT with the union of all sections' tools — practical ceiling for our 2–5 cluster target?
  4. Naming convention guidance for prefix to minimize accidental collisions.
  5. Anything that breaks the assumption that all fronted clusters share issuer/audience/signing_secret?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions