Skip to content

Add wide table CSV data load#255

Draft
hbarthels wants to merge 1 commit into
mainfrom
hb-csv-wide-table
Draft

Add wide table CSV data load#255
hbarthels wants to merge 1 commit into
mainfrom
hb-csv-wide-table

Conversation

@hbarthels
Copy link
Copy Markdown
Contributor

@hbarthels hbarthels commented Jun 3, 2026

What

Adds a table form to csv_data for loading tabular CSV data into a single output relation with schema key=[T1, T2, ..., TN], value=[] — no synthesized row ID in the key.

The new S-expression syntax:

; per-column mode (unchanged)
(columns
  (column "src" :col_src [UINT128 INT])
  (column "dst" :col_dst [UINT128 INT]))

; new table mode — no row ID in output
(table :edges ["src" "dst"] [INT INT])

Changes

Proto (logic.proto)

  • New CSVTarget message: target_id (RelationId), column_names (repeated string), types (repeated Type)
  • optional CSVTarget target = 5 on CSVData; mutually exclusive with the existing columns field

Grammar (grammar.y)

  • New csv_table rule: "(" "table" relation_id "[" STRING* "]" "[" type* "]" ")"
  • csv_data rewritten to a single alternative with gnf_columns? csv_table? — both optional and distinguishable by LL(2) lookahead ("columns" vs "table"), avoiding the LL(3) conflict that would arise from two alternatives sharing the long csvlocator csv_config prefix
  • Construct/deconstruct helper functions for the two modes

Generated files

  • Parsers, pretty-printers, and proto stubs regenerated for Python, Julia, and Go

Tests

  • tests/lqp/csv_table.lqp — four table-mode variants (int/int edges, string/float scores, single-column, custom delimiter) with binary and snapshot artifacts

Reviewer notes

  • The __slots__ changes in transactions_pb2.pyi and other .pyi stubs are a pre-existing version drift between the committed stubs and the current buf pyi plugin (1.56.0); they are not caused by the proto changes here.

@hbarthels hbarthels requested a review from comnik June 3, 2026 12:56
@hbarthels hbarthels changed the title Add CSV table mode: single-relation loading without row ID Add wide table CSV data load Jun 3, 2026
Introduces a `table` form for `csv_data` that loads all selected columns
into one output relation with schema `key=[T1..TN], value=[]`, dropping
the per-column row-ID keying of the existing `columns` form.

Proto:
- Add `CsvTarget` message (target_id, column_names, types)
- Add `optional CsvTarget target = 5` to `CSVData`

Grammar:
- Add `csv_table` rule: `(table <rel_id> ["col"...] [Type...])`
- Rewrite `csv_data` to accept either `gnf_columns?` or `csv_table?`
  in a single alternative (avoids LL(3) conflict on the shared prefix)
- Add construct/deconstruct helpers for the two modes

Regenerate parsers, pretty-printers, and proto stubs for Python,
Julia, and Go. Add `tests/lqp/csv_table.lqp` with four table-mode
variants and corresponding binary/snapshot artifacts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@hbarthels hbarthels force-pushed the hb-csv-wide-table branch from f865983 to 6304a9b Compare June 3, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant