Add benchmarks for different execution plans and Arrow IPC serde #1

yashmayya · 2026-01-22T00:38:59Z

Equivalent of https://github.com/startreedata/pinot/pull/383 but in Rust for DataFusion / Arrow.
Benchmarks can be run like cargo bench --bench filter_bench -p datafusion-physical-plan.

gortiz · 2026-01-22T07:32:04Z

datafusion/physical-plan/benches/bench_utils.rs

+pub fn deserialize_from_ipc(data: &[u8]) -> (SchemaRef, Vec<RecordBatch>) {
+    let cursor = Cursor::new(data);
+    let reader = StreamReader::try_new(cursor, None).unwrap();
+    let schema = reader.schema();
+    let batches: Vec<RecordBatch> = reader.map(|r| r.expect("Failed to read batch")).collect();
+    (schema, batches)
+}


Can we try zero-copy IPC as well? See https://github.com/apache/arrow-rs/blob/main/arrow/examples/zero_copy_ipc.rs

Although that example uses files mmaped, it should work in the same way with any vec of bytes.

The reason for this is that if we change the protocol to directly send arrow blocks, we are going to buffer them in memory and once we have them we should be able to directly use them without copying into new RecordBatches.

We should also be able to do the same when writing, although I didn't find an example on how to do it.

Notice a real implementation would need to receive/send these bytes through the network, but we are also ignoring that cost in the Java version

I've been investigating and it looks we cannot serialize in zero-copy fashion, so I created #2, which uses zero-copy deser.

The serialization is not that important given we will have to copy the structure anyway in order to send partitions when using that kind of exchanges. In case we end up committing to datafusion, we will need to think whether we want to use the streaming or the file IPC format

datafusion/physical-plan/benches/filter_bench.rs

gortiz · 2026-01-22T07:40:20Z

datafusion/physical-plan/benches/filter_bench.rs

+/// 4. **full_pipeline**: Complete deser + filter + output serialization
+///    - Real-world end-to-end latency including result serialization
+///    - Relevant for scenarios where results are sent over network


Unless we implement zero-copy reads in bench_utils, this is an unfair comparison to Java, as in Java we are using zero-copy (SerializedDataBlock is a wrapper on top of a DataBlock which just points to the original ByteBuffer). It is true that later in the Java pipeline we always convert SerializedDataBlock into heap blocks, but that is one of the inefficiencies current MSE has that we can remove with DataFusion.

We should either remove the deserialization time from this benchmark or implement zero-copy reads.

gortiz · 2026-01-22T07:44:50Z

datafusion/physical-plan/benches/filter_bench.rs

+///    - Relevant for scenarios where results are sent over network
+fn bench_filter(c: &mut Criterion) {
+    // Create a Tokio runtime for async execution
+    let rt = Runtime::new().unwrap();


This may be unfair because this runtime can use multiple threads, whereas Java uses only one.

Probably this isn't actually important because the pipeline we create uses a single partition and therefore probably a single thread, but I think it would be better to only use the original carrier thread.

* Try to use zero copy deser * Try string and byte views * Actual zero copy deser/serde * Use zero copy on benchmarks * Remove incorrect docs and one unused fun

…startreedata/datafusion into serde-and-physical-plan-exec-benchmarks

Add benchmarks for different execution plans and Arrow IPC serde

c7f6c86

gortiz self-requested a review January 22, 2026 07:15

gortiz reviewed Jan 22, 2026

View reviewed changes

yashmayya and others added 15 commits January 22, 2026 13:19

Update filter condition; use single-threaded Tokio runtime

579d8c4

Move serde/deser benchmarks to their own file

dece129

Add count_group_by_bench

a17dfed

Try to use zero copy deser (#2)

3222eb2

* Try to use zero copy deser * Try string and byte views * Actual zero copy deser/serde * Use zero copy on benchmarks * Remove incorrect docs and one unused fun

use zero copy read in count_group_by_bench.rs

8abdad0

change the filter benchmark to filter out 50% of the rows

4c1e95c

Add a benchmark for distinct count

f478fa3

Add hash_join_bench

e3d02b7

Improve distinct_group_by_bench.rs

eb09232

Merge branch 'serde-and-physical-plan-exec-benchmarks' of github.com:…

09c7a3b

…startreedata/datafusion into serde-and-physical-plan-exec-benchmarks

Reduce num_batches to 1

fe6753d

New benchmarks

1a6f0dd

change columns to lower case

5a11da7

Fix filter_jni_benchmark

3846e08

Add different batche sizes for serde and deser

d399216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for different execution plans and Arrow IPC serde #1

Add benchmarks for different execution plans and Arrow IPC serde #1

Uh oh!

yashmayya commented Jan 22, 2026 •

edited

Loading

Uh oh!

gortiz Jan 22, 2026

Uh oh!

gortiz Jan 22, 2026

Uh oh!

Uh oh!

gortiz Jan 22, 2026

Uh oh!

gortiz Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add benchmarks for different execution plans and Arrow IPC serde #1

Are you sure you want to change the base?

Add benchmarks for different execution plans and Arrow IPC serde #1

Uh oh!

Conversation

yashmayya commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gortiz Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gortiz Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gortiz Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gortiz Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yashmayya commented Jan 22, 2026 •

edited

Loading