Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 19, 2026

Which issue does this PR close?

  • Closes #.

What changes are included in this PR?

Add support for converting Binary and LargeBinary DataFusion ScalarValue types to Iceberg Datum, enabling binary predicates to be pushed down to the Iceberg storage layer.

This conversion allows SQL queries with binary hex literals (X'...') to push predicates down to Iceberg, improving query performance by filtering data at the storage level rather than in DataFusion.

The integration test verifies that binary predicates are successfully pushed down end-to-end:

  • Without conversion: predicate stays in FilterExec with predicate:[]
  • With conversion: predicate pushed to IcebergTableScan

Other scalar types (Boolean, Timestamp, Decimal) were investigated but excluded because they are not reachable through practical usage:

  • Boolean: DataFusion aggressively optimizes comparisons (e.g., x=true becomes just x) before reaching the converter
  • Timestamp/Decimal: SQL literals are converted to strings/other types before reaching the converter

Are these changes tested?

…shdown

Add support for converting Binary and LargeBinary DataFusion ScalarValue
types to Iceberg Datum, enabling binary predicates to be pushed down to
the Iceberg storage layer.

This conversion allows SQL queries with binary hex literals (X'...')
to push predicates down to Iceberg, improving query performance by
filtering data at the storage level rather than in DataFusion.

The integration test verifies that binary predicates are successfully
pushed down end-to-end:
- Without conversion: predicate stays in FilterExec with predicate:[]
- With conversion: predicate pushed to IcebergTableScan

Other scalar types (Boolean, Timestamp, Decimal) were investigated but
excluded because they are not reachable through practical usage:
- Boolean: DataFusion aggressively optimizes comparisons (e.g., x=true
  becomes just x) before reaching the converter
- Timestamp/Decimal: SQL literals are converted to strings/other types
  before reaching the converter

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@viirya viirya force-pushed the feat/datafusion-expand-scalar-conversions branch from aeee2e8 to d8bd5bc Compare January 19, 2026 06:53
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya for this, generally LGTM!

}

#[tokio::test]
async fn test_binary_predicate_pushdown() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Move the binary predicate pushdown integration test from Rust integration
tests to sqllogictest framework for better test organization and coverage.

Changes:
- Add binary_predicate_pushdown.slt test file
- Create test_binary_table in DataFusion engine setup
- Update show_tables.slt to include test_binary_table
- Add test to df_test.toml schedule
- Remove test_binary_predicate_pushdown from integration_datafusion_test.rs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya for this pr!

@liurenjie1024 liurenjie1024 merged commit 20ce7a5 into apache:main Jan 21, 2026
17 checks passed
@viirya
Copy link
Member Author

viirya commented Jan 21, 2026

Thanks @liurenjie1024

@viirya viirya deleted the feat/datafusion-expand-scalar-conversions branch January 21, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants