test: add unhex dictionary coverage by yuboxx · Pull Request #4222 · apache/datafusion-comet

yuboxx · 2026-05-05T05:40:25Z

What changes were proposed in this pull request?

This PR adds coverage for the native Spark-compatible unhex expression with Parquet dictionary encoding enabled and disabled.

The original issue suggested that dictionary-backed inputs might need expression-level support. The added tests confirm that Parquet dictionary string values are already unpacked before expression evaluation, so this PR does not change the native unhex implementation.

The new coverage includes:

Spark expression coverage with parquet.enable.dictionary=false,true
Generated primitive Parquet coverage via makeParquetFileAllPrimitiveTypes, using the _8 UTF8 string column

Closes #477.

How was this patch tested?

cargo test -p datafusion-comet-spark-expr math_funcs::unhex::test
./mvnw -pl spark -Dsuites=org.apache.comet.CometExpressionSuite -Dtest=none -Pspark-4.0 -Pscala-2.13 test

andygrove · 2026-05-05T16:07:16Z

Thanks for the contribution @yuboxx. Could you tell me more about the motivation for this PR? The changes looks reasonable, but do not address any bugs as far as I can tell. Dictionary-encoded arrays are unpacked in parquet_convert_array before any expressions can be evaluated.

yuboxx · 2026-05-06T03:50:30Z

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

andygrove · 2026-05-06T04:26:07Z

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

Sounds good! Thanks

andygrove

LGTM pending CI

andygrove mentioned this pull request May 5, 2026

Preserve dictionary encoding through native expressions where possible #4228

Open

yuboxx force-pushed the fix-unhex-dictionary-477 branch 2 times, most recently from 5f831e7 to cf6552a Compare May 6, 2026 03:49

yuboxx changed the title ~~fix: support unhex on dictionary strings~~ test: add unhex dictionary coverage May 6, 2026

test: add unhex dictionary coverage

f474012

yuboxx force-pushed the fix-unhex-dictionary-477 branch from cf6552a to f474012 Compare May 6, 2026 03:56

andygrove approved these changes May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add unhex dictionary coverage#4222

test: add unhex dictionary coverage#4222
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-unhex-dictionary-477

yuboxx commented May 5, 2026 •

edited

Loading

Uh oh!

andygrove commented May 5, 2026

Uh oh!

yuboxx commented May 6, 2026

Uh oh!

andygrove commented May 6, 2026

Uh oh!

andygrove left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuboxx commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

andygrove commented May 5, 2026

Uh oh!

yuboxx commented May 6, 2026

Uh oh!

andygrove commented May 6, 2026

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuboxx commented May 5, 2026 •

edited

Loading