Skip to content

Optimized PartitionedOutput staging hub#1799

Open
yingsu00 wants to merge 13 commits intooss-mainfrom
optimized_partitionedoutput
Open

Optimized PartitionedOutput staging hub#1799
yingsu00 wants to merge 13 commits intooss-mainfrom
optimized_partitionedoutput

Conversation

@yingsu00
Copy link
Copy Markdown
Collaborator

@yingsu00 yingsu00 commented Mar 11, 2026

This PR is to track the commits for the optimized PartitionedOutput operator. Related new PRs shall be submitted to optimized_partitionedoutput branch, and then alchemy merge to oss-main.

Review history of the first two commits can be found in
feat: Introducing PartitionedVector #1596
feat: Add PartitionedRowVector implementation #1770

@yingsu00 yingsu00 requested a review from majetideepak as a code owner March 11, 2026 14:56
@yingsu00 yingsu00 removed the request for review from majetideepak March 11, 2026 14:56
@yingsu00
Copy link
Copy Markdown
Collaborator Author

alchemy link 86db93b

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Added new rebase item:

@yingsu00
Copy link
Copy Markdown
Collaborator Author

alchemy link 6dd3661

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-03-10T12:01:55Z by @yingsu00 via #1799 (comment):

Added new rebase item:

@yingsu00
Copy link
Copy Markdown
Collaborator Author

alchemy link 2706c1e

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-03-12T13:19:10Z by @yingsu00 via #1799 (comment):

Added new rebase item:

@unidevel
Copy link
Copy Markdown
Collaborator

alchemy link 86db93b,6dd3661c7afef52c42ed1c5ca83c9e57e21ec2b3,2706c1e80f9463bd4fdd805e296839f964437ca3

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Failed to add new rebase item:

The new rebase item overlaps with the following existing item:

Please double check your input and retry.

@unidevel
Copy link
Copy Markdown
Collaborator

alchemy link 86db93b,6dd3661c7afef52c42ed1c5ca83c9e57e21ec2b3,2706c1e80f9463bd4fdd805e296839f964437ca3 @2026-03-10T12:01:55Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Failed to add new rebase item:

The new rebase item overlaps with the following existing item:

Please double check your input and retry.

@unidevel
Copy link
Copy Markdown
Collaborator

alchemy link 86db93b,6dd3661c7afef52c42ed1c5ca83c9e57e21ec2b3,2706c1e80f9463bd4fdd805e296839f964437ca3 @2026-03-12T13:19:10Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-03-12T13:19:10Z by @unidevel via #1799 (comment):

Added new rebase item:

yingsu00 and others added 10 commits April 17, 2026 13:12
Signed-off-by: Xin Zhang <xin-zhang2@ibm.com>

Alchemy-item: (ID = 1167) Add PartitionedRowVector commit 1/1 - f2af427
PartitionedFlatVector::partition() and PartitionedRowVector::partition()
called mutableRawNulls() unconditionally. mutableRawNulls() allocates a
null buffer if one does not exist, causing mayHaveNulls() to return true
for every vector after partitioning, even when the original had no nulls.

Fix both sites to check rawNulls() first and only call mutableRawNulls()
when a null buffer already exists.

Add noNullBufferAllocatedForNullFreeFlat and
noNullBufferAllocatedForNullFreeRow tests to PartitionedVectorTest to
cover this case.

# Conflicts:
#	velox/vector/PartitionedVector.cpp
This commit introduces PrestoIterativePartitioningSerializer, which
buffers RowVectors across multiple append() calls, partitions rows
in-place using PartitionedVector, and on flush() serializes each
non-empty partition into a Presto wire-format IOBuf. The serializer has
no dependency on velox_exec: it returns raw folly::IOBuf objects,
leaving SerializedPage creation to the caller.
This commit introduces OptimizedPartitionedOutput, a PartitionedOutput
operator backed by PrestoIterativePartitioningSerializer. Enabled via query
config key "optimized_repartitioning" (default off). LocalPlanner
selects it over the standard PartitionedOutput when the flag is set.

TODO: replicateNullsAndAny is not yet supported and raises a user error.
…geBenchmark

- Added normal vs optimized PartitionedOutput comparison by running each
  exchange case twice with kOptimizedPartitionedOutputEnabled=false/true.
- Added per-mode benchmark names:
  - exchange<Case>_normalPartitionedOutput
  - exchange<Case>_optimizedPartitionedOutput in ExchangeBenchmark.cpp.
- Refactored result printing into shared helpers and fixed output
  consistency in ExchangeBenchmark.cpp.
@xin-zhang2 xin-zhang2 force-pushed the optimized_partitionedoutput branch from 70ad998 to 211901c Compare April 17, 2026 12:12
@xin-zhang2
Copy link
Copy Markdown
Member

alchemy merge @2026-04-17T05:56:57Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

alchemy link 76dc41a,3853bf648ce8f361a0b3245aa469c63e8d0f7f8f,ff2e34b3b35311e72377ac4446cea592a86f44af,875c92c715df8a5a617430690471a662e91597ef,281a365ff3bdd025602e1d40614a1e7c431d625a,6519a8f1dbc2c19e332642333db0999eacd1ffe0,d8f34b40b751bb54307193475380ca52e3611ec9,9eafc9d8904079ea44c4401a90bb7912a8be1bf4,6f09ea9e45dc7095a0fa4dd247ea83bddc16fcaf,c114147d54c993712e4a25e2cb6f2f5123661391,211901c141f1b6b828e116eada52589eb40f3d09 @2026-04-17T05:56:57Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-04-17T05:56:57Z by @prestodb-ci via #1799 (comment):

Added new rebase item:

…mark

Split the local partition exchange benchmark out of ExchangeBenchmark
into its own executable and CMake target, while keeping the local
benchmark logic and statistics reporting available in a dedicated binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants