Skip to content

test: add unhex dictionary coverage#4222

Open
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-unhex-dictionary-477
Open

test: add unhex dictionary coverage#4222
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-unhex-dictionary-477

Conversation

@yuboxx
Copy link
Copy Markdown
Contributor

@yuboxx yuboxx commented May 5, 2026

What changes were proposed in this pull request?

This PR adds coverage for the native Spark-compatible unhex expression with Parquet dictionary encoding enabled and disabled.

The original issue suggested that dictionary-backed inputs might need expression-level support. The added tests confirm that Parquet dictionary string values are already unpacked before expression evaluation, so this PR does not change the native unhex implementation.

The new coverage includes:

  • Spark expression coverage with parquet.enable.dictionary=false,true
  • Generated primitive Parquet coverage via makeParquetFileAllPrimitiveTypes, using the _8 UTF8 string column

Closes #477.

How was this patch tested?

  • cargo test -p datafusion-comet-spark-expr math_funcs::unhex::test
  • ./mvnw -pl spark -Dsuites=org.apache.comet.CometExpressionSuite -Dtest=none -Pspark-4.0 -Pscala-2.13 test

@andygrove
Copy link
Copy Markdown
Member

Thanks for the contribution @yuboxx. Could you tell me more about the motivation for this PR? The changes looks reasonable, but do not address any bugs as far as I can tell. Dictionary-encoded arrays are unpacked in parquet_convert_array before any expressions can be evaluated.

@yuboxx
Copy link
Copy Markdown
Contributor Author

yuboxx commented May 6, 2026

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

@yuboxx yuboxx changed the title fix: support unhex on dictionary strings test: add unhex dictionary coverage May 6, 2026
@yuboxx yuboxx force-pushed the fix-unhex-dictionary-477 branch from cf6552a to f474012 Compare May 6, 2026 03:56
@andygrove
Copy link
Copy Markdown
Member

Could you tell me more about the motivation for this PR?

Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense.

Sounds good! Thanks

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dictionary support to unhex and test dictionary and scalar cases

2 participants