Skip to content

Surface ClickHouse query stats (written_rows, read_rows, elapsed) in read/write tool results #137

@BorisTyshkevich

Description

@BorisTyshkevich

Goal

Today our tools give almost no feedback beyond errors. A write returns a hardcoded {"status":"OK"} / "Successfully executed X", so the agent can't tell the user how many rows were actually written. We want post-query statistics surfaced in both write and read tool results — written_rows first and foremost, plus read_rows/read_bytes/elapsed.

Scope:

  • Writes (INSERT) — primary. written_rows / written_bytes.
  • Reads (SELECT) — covered too. read_rows, read_bytes, result_rows, elapsed.
  • Out of scope: mutations / DDL (ALTER … UPDATE/DELETE, lightweight DELETE, CREATE/DROP/TRUNCATE) — ClickHouse executes these asynchronously, so there is no synchronous affected-row count to report. We will not fake one.

Research: the driver does not surface stats over HTTP

ClickHouse exposes per-query stats via the X-ClickHouse-Summary HTTP header (and progress/profile packets over native). We use Altinity/clickhouse-go/v2 (fork v2.45.1-0.20260424…). Findings from the fork source:

Path Evidence Result
HTTP exec (writes) conn_http_exec.go:18defer discardAndClose(res.Body); returns nil body discarded, no stats
HTTP query (reads) conn_http_query.go:31-135 — decodes only Native-format data blocks no progress/profile
HTTP response headers conn_http.go:703-722 executeRequest returns resp, never reads X-ClickHouse-Summary header present, never read
Native (TCP) conn_process.go:12-18onProcess{ progress, profileInfo } dispatched in packet loop works, but N/A — see below

So clickhouse.WithProgress / WithProfileInfo (which exist; proto.Progress has WroteRows, WroteBytes, Rows, Bytes, ElapsedNs) only fire on the native protocol. Our production auth (OAuth Bearer / Basic via the ch-jwt-verify sidecar) is HTTP-bound, and the HTTP transport reads none of it.

Solution: capture X-ClickHouse-Summary via Options.TransportFunc (no fork patch)

The fork honors a custom round-tripper — conn_http.go:261if opt.TransportFunc != nil { return opt.TransportFunc(rt) }. We install a RoundTripper that reads X-ClickHouse-Summary off resp.Header and writes it into a per-request *summaryHolder stashed in the ctx (reachable in RoundTrip via req.Context(), since the driver threads our ctx into http.NewRequestWithContext, conn_http.go:612). Composes cleanly with OAuth/Basic, which is applied separately in applyOptionsToRequest.

Proven end-to-end (spike branch spike-query-stats, cmd/stats-spike/) against CH 26.3 over HTTP:

INSERT            -> source="header" written_rows=1234 written_bytes=9872 read_rows=1234   ✅
INS..SEL          -> source="header" written_rows=2500 read_rows=5000                       ✅
SELECT(stream)    -> source="header" read_rows=1234 result_rows=0                           ❌ premature header
SELECT(wait_eoq)  -> source="header" read_rows=7468 result_rows=7468                         ✅ accurate
  • The summary arrives as a normal response header (source="header") — no trailer handling needed.
  • INSERT is immediate and accurate (no streaming body).
  • Streaming SELECT emits a premature header (result_rows=0). Fix: wait_end_of_query=1 makes it fully accurate. Confirmed acceptable for our workloads (we already cap result size, so server-side buffering is bounded). We'll set it on both read and write paths for one accurate code path.

Implementation plan

  1. Install TransportFunc (statsRoundTripper) when building the HTTP client in pkg/clickhouse/client.go.
  2. Per-request *summaryHolder in ctx; parse X-ClickHouse-Summary JSON (read_rows, read_bytes, written_rows, written_bytes, result_rows, elapsed_ns, memory_usage).
  3. Add Stats *QueryStats \json:"stats,omitempty"`toQueryResult; populate in executeSelectandexecuteNonSelect. Set wait_end_of_query=1`.
  4. Replace the write tool's "Successfully executed X" with a summary line, e.g. "Inserted 1,234 rows (9.9 KB) in 7 ms." — keep the structured stats alongside for programmatic use.
  5. Graceful degradation: omit stats if the header is absent (native transport, summary-stripping proxy, older server). Never error on missing stats.
  6. Keep the payload compact (flat object / one line) — it rides every tool result the model processes.

Open questions

  1. Verbosity/token budget: include full stats on reads, or only read_rows + elapsed + a truncation note? Writes clearly want written_rows.
  2. Config toggle (clickhouse.query_stats, default on) for token-sensitive deployments?
  3. wait_end_of_query=1 memory implications for the largest uncapped internal queries (e.g. dynamic-tool discovery, schema introspection) — apply only to user-facing read/write, not internal queries?
  4. Should we still attempt native-protocol stats (via WithProgress/WithProfileInfo) for completeness, or keep HTTP-summary as the single source given prod is HTTP?

Spike: branch spike-query-stats, cmd/stats-spike/main.go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions