Conversation
Add TPC-H table definitions (8 tables), config with zstd compression, and generated golden examples. Parameterize test to cover both TPC-DS and TPC-H configs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 423c5be311
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| WITH ( | ||
| format = 'PARQUET', | ||
| partitioned_by = array['l_shipdate'], | ||
| external_location = 's3a://presto-workload-v2/tpch-sf10-parquet-partitioned-iceberg/lineitem/data' |
There was a problem hiding this comment.
Point zstd Hive external paths to zstd partitioned data
This zstd workflow creates and populates partitioned Iceberg data under tpch-sf10-parquet-partitioned-iceberg-zstd (see steps 3/4), but the Hive table here reads from tpch-sf10-parquet-partitioned-iceberg without the compression suffix. In environments where only the new zstd workflow is run, Hive will read the wrong location (or no data), which can silently invalidate benchmark results for the zstd variant.
Useful? React with 👍 / 👎.
| USE iceberg.tpch_sf10_parquet_iceberg_zstd; | ||
|
|
||
| INSERT INTO customer | ||
| SELECT * FROM iceberg.tpch_sf10_parquet_iceberg.customer; |
There was a problem hiding this comment.
Bootstrap zstd inserts from a schema this workflow creates
The generated zstd insert step reads from iceberg.tpch_sf10_parquet_iceberg.*, but this example set only creates ..._iceberg_zstd objects and does not include a step that creates/populates the uncompressed source schema. On a clean setup that follows this folder alone, the insert step fails with missing-table errors, so the new TPC-H flow is not self-contained.
Useful? React with 👍 / 👎.
xpengahana
left a comment
There was a problem hiding this comment.
do we need to address those comments?
| }, | ||
| { | ||
| "name": "l_returnflag", | ||
| "type": "VARCHAR(1)" |
| }, | ||
| { | ||
| "name": "l_linestatus", | ||
| "type": "VARCHAR(1)" |
There was a problem hiding this comment.
Same as the previous one. CHAR(1)?
| }, | ||
| { | ||
| "name": "o_orderstatus", | ||
| "type": "VARCHAR(1)" |

Add TPC-H table definitions (8 tables), config with zstd compression, and generated golden examples. Parameterize test to cover both TPC-DS and TPC-H configs.