-
Notifications
You must be signed in to change notification settings - Fork 395
feat(datafusion): Add Binary scalar value conversion for predicate pushdown #2048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datafusion): Add Binary scalar value conversion for predicate pushdown #2048
Conversation
…shdown Add support for converting Binary and LargeBinary DataFusion ScalarValue types to Iceberg Datum, enabling binary predicates to be pushed down to the Iceberg storage layer. This conversion allows SQL queries with binary hex literals (X'...') to push predicates down to Iceberg, improving query performance by filtering data at the storage level rather than in DataFusion. The integration test verifies that binary predicates are successfully pushed down end-to-end: - Without conversion: predicate stays in FilterExec with predicate:[] - With conversion: predicate pushed to IcebergTableScan Other scalar types (Boolean, Timestamp, Decimal) were investigated but excluded because they are not reachable through practical usage: - Boolean: DataFusion aggressively optimizes comparisons (e.g., x=true becomes just x) before reaching the converter - Timestamp/Decimal: SQL literals are converted to strings/other types before reaching the converter Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
aeee2e8 to
d8bd5bc
Compare
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for this, generally LGTM!
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_binary_predicate_pushdown() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have added sqllogictests support: https://github.com/liurenjie1024/iceberg-rust/blob/666a9fe1aaf1692583d6f44e4f7a1d52a688b217/crates/sqllogictest/testdata/schedules/df_test.toml#L19
Please move these tests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
Move the binary predicate pushdown integration test from Rust integration tests to sqllogictest framework for better test organization and coverage. Changes: - Add binary_predicate_pushdown.slt test file - Create test_binary_table in DataFusion engine setup - Update show_tables.slt to include test_binary_table - Add test to df_test.toml schedule - Remove test_binary_predicate_pushdown from integration_datafusion_test.rs Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for this pr!
|
Thanks @liurenjie1024 |
Which issue does this PR close?
What changes are included in this PR?
Add support for converting Binary and LargeBinary DataFusion ScalarValue types to Iceberg Datum, enabling binary predicates to be pushed down to the Iceberg storage layer.
This conversion allows SQL queries with binary hex literals (X'...') to push predicates down to Iceberg, improving query performance by filtering data at the storage level rather than in DataFusion.
The integration test verifies that binary predicates are successfully pushed down end-to-end:
Other scalar types (Boolean, Timestamp, Decimal) were investigated but excluded because they are not reachable through practical usage:
Are these changes tested?