Background
Dynamic write tools currently skip columns of type Array(...), Tuple(...), Dynamic, and JSON with a Warn log:
dynamic write tool: skipping column with unsupported type (no JSON Schema equivalent)
Dynamic and bare JSON are genuinely unrepresentable (schema changes per row). But Array and Tuple have fully static, introspectable schemas and can be mapped to JSON Schema.
Proposed approach: use DESCRIBE TABLE with subcolumns
Instead of writing a recursive parser for CH type strings, delegate decomposition to ClickHouse:
DESCRIBE TABLE db.my_table
SETTINGS describe_include_subcolumns = 1
This returns one row per column and one row per subcolumn, with an is_subcolumn column (0 or 1):
| name |
type |
default_kind |
comment |
is_subcolumn |
| id |
UInt64 |
|
|
0 |
| user |
Tuple(name String, age UInt8) |
|
|
0 |
| user.name |
String |
|
|
1 |
| user.age |
UInt8 |
|
|
1 |
| tags |
Array(String) |
|
|
0 |
For named Tuples, the subcolumn rows give us the leaf field names and simple types — no recursive parsing needed. For Arrays of named Tuples, CH surfaces subcolumns as Array(leaf_type) (e.g., items.id → Array(UInt32)), giving us the element schema directly.
Implementation sketch
- Replace the current
system.columns query in getTableColumnsForMode with DESCRIBE TABLE db.table SETTINGS describe_include_subcolumns=1.
- Group rows by top-level column name (split on first
.).
- For each top-level column (
is_subcolumn=0):
- Skip
MATERIALIZED/ALIAS as before.
- Keep skipping
Dynamic and bare JSON.
- For
Array(primitive) or other simple types: existing mapCHType path.
- For
Tuple(...) with named fields (subcolumn rows present): build {"type": "object", "properties": {...}, "required": [...]} from subcolumn rows.
- For
Tuple(...) with unnamed fields (no subcolumn rows): {"type": "array", "prefixItems": [...], "minItems": N, "maxItems": N} — requires one level of type-string parsing for the inner primitive types only.
- For
Array(Tuple(...)) with named fields: {"type": "array", "items": <object schema from subcolumns>}.
JSON Schema mapping
| CH type |
JSON Schema |
Array(String) |
{type: array, items: {type: string}} |
Array(UInt32) |
{type: array, items: {type: integer, format: int64}} |
Tuple(name String, age UInt8) (named) |
{type: object, properties: {name: {type: string}, age: {type: integer}}, required: [name, age]} |
Tuple(UInt32, String) (unnamed) |
{type: array, prefixItems: [{type: integer}, {type: string}], minItems: 2, maxItems: 2} |
Array(Tuple(id UInt32, val String)) |
{type: array, items: {type: object, properties: {id: ..., val: ...}}} |
SQL literal encoding for INSERT
The LLM sends values as JSON; we need to encode them into SQL literals in buildInsertQuery:
| JSON Schema type |
LLM input example |
SQL literal |
array of primitives |
["a","b","c"] |
['a','b','c'] |
object (named Tuple) |
{"name":"Alice","age":30} |
('Alice',30) — fields ordered by Tuple definition |
array of items (unnamed Tuple) |
[1,"hello"] |
(1,'hello') |
array of objects (Array(Tuple)) |
[{"id":1,"val":"x"},{"id":2,"val":"y"}] |
[('x',1),('y',2)] |
sqlLiteralChecked needs new cases: array (renders [...]) and object (renders (...) with values ordered by field list from dynamicToolParam).
What stays unsupported
Dynamic — schema is truly per-row, no static representation
- Bare
JSON / JSON(...) — same reason
Nested(...) — syntactic sugar for Array(Tuple(...)), could follow once Array(Tuple) works
Caveats
describe_include_subcolumns was added in ClickHouse 22.4. Antalya 26.1.x supports it; verify presence in any older Altinity builds still under test.
- Unnamed Tuple subcolumn names in CH are positional (
1, 2, 3…) — DESCRIBE does not emit dotted subcolumn rows for them, so the unnamed-Tuple path still requires one level of type-string parsing (inner types are always primitives in the common case).
Nullable(Array(...)) and LowCardinality(...) wrappers around complex types: strip outer wrapper, apply inner type mapping.
Acceptance criteria
Background
Dynamic write tools currently skip columns of type
Array(...),Tuple(...),Dynamic, andJSONwith aWarnlog:Dynamicand bareJSONare genuinely unrepresentable (schema changes per row). ButArrayandTuplehave fully static, introspectable schemas and can be mapped to JSON Schema.Proposed approach: use
DESCRIBE TABLEwith subcolumnsInstead of writing a recursive parser for CH type strings, delegate decomposition to ClickHouse:
This returns one row per column and one row per subcolumn, with an
is_subcolumncolumn (0 or 1):For named Tuples, the subcolumn rows give us the leaf field names and simple types — no recursive parsing needed. For Arrays of named Tuples, CH surfaces subcolumns as
Array(leaf_type)(e.g.,items.id → Array(UInt32)), giving us the element schema directly.Implementation sketch
system.columnsquery ingetTableColumnsForModewithDESCRIBE TABLE db.table SETTINGS describe_include_subcolumns=1..).is_subcolumn=0):MATERIALIZED/ALIASas before.Dynamicand bareJSON.Array(primitive)or other simple types: existingmapCHTypepath.Tuple(...)with named fields (subcolumn rows present): build{"type": "object", "properties": {...}, "required": [...]}from subcolumn rows.Tuple(...)with unnamed fields (no subcolumn rows):{"type": "array", "prefixItems": [...], "minItems": N, "maxItems": N}— requires one level of type-string parsing for the inner primitive types only.Array(Tuple(...))with named fields:{"type": "array", "items": <object schema from subcolumns>}.JSON Schema mapping
Array(String){type: array, items: {type: string}}Array(UInt32){type: array, items: {type: integer, format: int64}}Tuple(name String, age UInt8)(named){type: object, properties: {name: {type: string}, age: {type: integer}}, required: [name, age]}Tuple(UInt32, String)(unnamed){type: array, prefixItems: [{type: integer}, {type: string}], minItems: 2, maxItems: 2}Array(Tuple(id UInt32, val String)){type: array, items: {type: object, properties: {id: ..., val: ...}}}SQL literal encoding for INSERT
The LLM sends values as JSON; we need to encode them into SQL literals in
buildInsertQuery:arrayof primitives["a","b","c"]['a','b','c']object(named Tuple){"name":"Alice","age":30}('Alice',30)— fields ordered by Tuple definitionarrayof items (unnamed Tuple)[1,"hello"](1,'hello')arrayof objects (Array(Tuple))[{"id":1,"val":"x"},{"id":2,"val":"y"}][('x',1),('y',2)]sqlLiteralCheckedneeds new cases:array(renders[...]) andobject(renders(...)with values ordered by field list fromdynamicToolParam).What stays unsupported
Dynamic— schema is truly per-row, no static representationJSON/JSON(...)— same reasonNested(...)— syntactic sugar forArray(Tuple(...)), could follow onceArray(Tuple)worksCaveats
describe_include_subcolumnswas added in ClickHouse 22.4. Antalya 26.1.x supports it; verify presence in any older Altinity builds still under test.1,2,3…) —DESCRIBEdoes not emit dotted subcolumn rows for them, so the unnamed-Tuple path still requires one level of type-string parsing (inner types are always primitives in the common case).Nullable(Array(...))andLowCardinality(...)wrappers around complex types: strip outer wrapper, apply inner type mapping.Acceptance criteria
Array(primitive)appear in the dynamic write tool schema as JSONarrayTuplecolumns appear as JSONobjectwith correct property names and typesTuplecolumns appear as JSONarraywithprefixItemsArray(Tuple(...))with named fields produces{type: array, items: {type: object, ...}}Dynamicand bareJSONcolumns continue to be skipped with aWarnlogbuildInsertQuerycorrectly encodes array and tuple values as SQL literalsArray(String), namedTuple,Array(named Tuple)