test: add unhex dictionary coverage#4222
Conversation
|
Thanks for the contribution @yuboxx. Could you tell me more about the motivation for this PR? The changes looks reasonable, but do not address any bugs as far as I can tell. Dictionary-encoded arrays are unpacked in |
5f831e7 to
cf6552a
Compare
Thanks for the pointer! I was looking for some issues to work with as I'm onboarding to the codebase and stumbled on issue #477. It raised a point about dictionary type support in unhex, but apparently dictionary array is already being unpacked in advance. I'm going to remove my change in unhex.rs and only keep the unit test change to show dictionary is supported. Let me know if this makes sense. |
cf6552a to
f474012
Compare
Sounds good! Thanks |
What changes were proposed in this pull request?
This PR adds coverage for the native Spark-compatible
unhexexpression with Parquet dictionary encoding enabled and disabled.The original issue suggested that dictionary-backed inputs might need expression-level support. The added tests confirm that Parquet dictionary string values are already unpacked before expression evaluation, so this PR does not change the native
unheximplementation.The new coverage includes:
parquet.enable.dictionary=false,truemakeParquetFileAllPrimitiveTypes, using the_8UTF8 string columnCloses #477.
How was this patch tested?
cargo test -p datafusion-comet-spark-expr math_funcs::unhex::test./mvnw -pl spark -Dsuites=org.apache.comet.CometExpressionSuite -Dtest=none -Pspark-4.0 -Pscala-2.13 test