feat: comet native scan improvements - Dynamic Partition Pruning #3546
feat: comet native scan improvements - Dynamic Partition Pruning #3546Shekharrajak wants to merge 14 commits intoapache:mainfrom
Conversation
5099c01 to
00fc8ce
Compare
cb700d9 to
f8dd8c8
Compare
| : : : +- BroadcastHashJoin | ||
| : : : :- Filter | ||
| : : : : +- ColumnarToRow | ||
| : : : : +- Scan parquet spark_catalog.default.store_returns [COMET: Native DataFusion scan does not support subqueries/dynamic pruning] |
There was a problem hiding this comment.
CI checks where failing and hence need to update them
There was a problem hiding this comment.
I'd suggest not changing the fallback message for this PR and have a follow on PR to improve the message, so that this PR is smaller and just focuses on the functionality.
Another option is to add a new config to feature gate the DPP support and disable it for now in the stability suite.
There was a problem hiding this comment.
Thanks @andygrove for review, Actually unit tests were failing and hence I had to update in PR itself to make all checks green.
| val scanImpl = COMET_NATIVE_SCAN_IMPL.get() | ||
|
|
||
| // native_datafusion + DPP requires AQE. Without AQE, DPP subqueries aren't prepared | ||
| // before the scan tries to use their results, causing "has not finished" errors. |
|
Please trigger the CI checks |
|
CI check failure is due to network issue : How can we re-trigger those 2 failing CI workflow ? |
|
Thanks @Shekharrajak! Closing in favor of #4112. |
Which issue does this PR close?
Ref #3510
Rationale for this change
CometNativeScanExec currently falls back to Spark when Dynamic Partition Pruning (DPP) is present. This limits performance for star-schema queries that rely on DPP to prune partitions at
runtime based on dimension table filters.
What changes are included in this PR?
Added DPP support to CometNativeScanExec for V1 native scans
Implemented partition filter evaluation from DPP subqueries
How are these changes tested?
Added DPP benchmark comparing Spark vs Comet native scan performance
Unit tests