Skip to content

Add full_table mode for single-relation Iceberg loading#253

Open
hbarthels wants to merge 2 commits into
mainfrom
hb-iceberg-full-table
Open

Add full_table mode for single-relation Iceberg loading#253
hbarthels wants to merge 2 commits into
mainfrom
hb-iceberg-full-table

Conversation

@hbarthels
Copy link
Copy Markdown
Contributor

@hbarthels hbarthels commented Jun 1, 2026

Summary

  • Adds a new (full_table :name [T1 T2 ...]) syntax to iceberg_data, enabling all Iceberg columns to be loaded into a single relation keyed by the row ID (UInt128).
  • Adds IcebergTarget proto message and optional IcebergTarget target = 7 field to IcebergData; existing columns field is unchanged.
  • Grammar: adds full_table nonterminal and changes iceberg_data to use gnf_columns? full_table? (one must be present; mutually exclusive by convention — same pattern as csv_locator_paths? csv_locator_inline_data?).
  • All SDK parsers and printers (Python, Julia, Go) regenerated; protobuf stubs regenerated for all three SDKs.

New syntax

; per-column mode (unchanged)
(iceberg_data ... (columns (column "src" :col_src [UINT128 INT]) ...) ...)

; new single-relation mode — row_id as key, columns as values
(iceberg_data ... (full_table :edges [UINT128 INT INT]) ...)
;                                     ^^^^^^  ^^^ ^^^
;                                     row_id  src dst

Test plan

  • make test passes (720+ Python, 32k Julia, Go tests — no regressions)
  • New round-trip test: tests/lqp/iceberg_data_full_table.lqp parses and pretty-prints back to the same form
  • Binary snapshot tests/bin/iceberg_data_full_table.bin generated and committed

Notes for reviewer

  • The grammar uses gnf_columns? full_table? rather than a strict (gnf_columns | full_table) alternation. A proper XOR would require an intermediate nonterminal with a union type, which the grammar type system doesn't support natively. Semantic enforcement (exactly one present) is left to downstream validation.

🤖 Generated with Claude Code

hbarthels and others added 2 commits June 1, 2026 23:30
Adds an alternative `(full_table :name [T1 T2 ...])` syntax to
`iceberg_data`, enabling all columns to be loaded into a single relation
keyed by the row ID (UInt128). Previously only the per-column
`(columns (column ...) ...)` mode was supported.

Proto: adds `IcebergTarget` message and optional `target` field (field 7)
to `IcebergData`. Grammar: adds `full_table` nonterminal and relaxes
`iceberg_data` to accept `gnf_columns? full_table?` (exactly one must be
present at runtime). All SDK parsers/printers regenerated; new round-trip
test added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hbarthels hbarthels marked this pull request as ready for review June 2, 2026 08:48
@hbarthels hbarthels requested a review from comnik June 2, 2026 08:48
(catalog_uri "https://catalog.example")
(properties (prop "type" "rest"))
(auth_properties))
(full_table :edges [UINT128 INT INT])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll read up on this tomorrow, but one immediate question is how we'll handle the incremental loading and delta outputs. I guess in that case we'd still get separate outputs for inserts and deletes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants