Skip to content

feat: add TPC-H support to genddl#75

Open
ethanyzhang wants to merge 3 commits intomainfrom
add-tpch-genddl
Open

feat: add TPC-H support to genddl#75
ethanyzhang wants to merge 3 commits intomainfrom
add-tpch-genddl

Conversation

@ethanyzhang
Copy link
Collaborator

Add TPC-H table definitions (8 tables), config with zstd compression, and generated golden examples. Parameterize test to cover both TPC-DS and TPC-H configs.

Add TPC-H table definitions (8 tables), config with zstd compression,
and generated golden examples. Parameterize test to cover both TPC-DS
and TPC-H configs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 423c5be311

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

WITH (
format = 'PARQUET',
partitioned_by = array['l_shipdate'],
external_location = 's3a://presto-workload-v2/tpch-sf10-parquet-partitioned-iceberg/lineitem/data'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Point zstd Hive external paths to zstd partitioned data

This zstd workflow creates and populates partitioned Iceberg data under tpch-sf10-parquet-partitioned-iceberg-zstd (see steps 3/4), but the Hive table here reads from tpch-sf10-parquet-partitioned-iceberg without the compression suffix. In environments where only the new zstd workflow is run, Hive will read the wrong location (or no data), which can silently invalidate benchmark results for the zstd variant.

Useful? React with 👍 / 👎.

USE iceberg.tpch_sf10_parquet_iceberg_zstd;

INSERT INTO customer
SELECT * FROM iceberg.tpch_sf10_parquet_iceberg.customer;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bootstrap zstd inserts from a schema this workflow creates

The generated zstd insert step reads from iceberg.tpch_sf10_parquet_iceberg.*, but this example set only creates ..._iceberg_zstd objects and does not include a step that creates/populates the uncompressed source schema. On a clean setup that follows this folder alone, the insert step fails with missing-table errors, so the new TPC-H flow is not self-contained.

Useful? React with 👍 / 👎.

Copy link
Contributor

@xpengahana xpengahana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to address those comments?

},
{
"name": "l_returnflag",
"type": "VARCHAR(1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this should be CHAR(1)?

Image

},
{
"name": "l_linestatus",
"type": "VARCHAR(1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous one. CHAR(1)?

},
{
"name": "o_orderstatus",
"type": "VARCHAR(1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants