[SPARK-56718][PYTHON] Refactor SQL_COGROUPED_MAP_PANDAS_UDF by Yicong-Huang · Pull Request #55674 · apache/spark

Yicong-Huang · 2026-05-04T20:21:26Z

What changes were proposed in this pull request?

Refactor SQL_COGROUPED_MAP_PANDAS_UDF to be self-contained in read_udfs(), moving logic from CogroupPandasUDFSerializer into worker.py, using ArrowStreamCoGroupSerializer as pure I/O.

Why are the changes needed?

Part of SPARK-55388 (Refactor PythonEvalType processing logic). Making each eval type self-contained in read_udfs() improves readability and makes it easier to reason about the data flow for each eval type independently.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. No behavior change.

ASV benchmark (CogroupedMapPandasUDFTimeBench), 3 runs each, averaged:

master: ebfdf8b6b4  vs  PR: 87d96ebe40

Time (ms, lower = better)
scenario          udf                  master      PR        diff
---------------------------------------------------------------------
few_groups_sm     identity_udf          160.3     159.0   -0.81%
few_groups_sm     concat_udf            181.0     181.0    0.00%
few_groups_sm     left_semi_udf         182.7     181.7   -0.55%
few_groups_sm     key_identity_udf      160.3     162.0   +1.04%
few_groups_lg     identity_udf          413.3     411.3   -0.48%
few_groups_lg     concat_udf            545.7     537.3   -1.54%
few_groups_lg     left_semi_udf         508.3     500.7   -1.51%
few_groups_lg     key_identity_udf      412.3     412.3    0.00%
many_groups_sm    identity_udf         1356.7    1356.7    0.00%
many_groups_sm    concat_udf           1480.0    1473.3   -0.45%
many_groups_sm    left_semi_udf        1496.7    1503.3   +0.44%
many_groups_sm    key_identity_udf     1370.0    1363.3   -0.49%
many_groups_lg    identity_udf          765.3     760.7   -0.61%
many_groups_lg    concat_udf            922.7     908.7   -1.52%
many_groups_lg    left_semi_udf         887.0     886.0   -0.11%
many_groups_lg    key_identity_udf      764.7     767.0   +0.30%
wide_values       identity_udf         1073.3    1073.3    0.00%
wide_values       concat_udf           1276.7    1266.7   -0.78%
wide_values       left_semi_udf        1130.0    1130.0    0.00%
wide_values       key_identity_udf     1080.0    1073.3   -0.62%
multi_key         identity_udf          425.3     423.7   -0.39%
multi_key         concat_udf            473.7     473.0   -0.14%
multi_key         left_semi_udf         463.0     451.3   -2.52%
multi_key         key_identity_udf      430.3     428.3   -0.46%
---------------------------------------------------------------------
SUM                                  17995.0   17834.0   -0.90%

Aggregate essentially flat (-0.90%); per-scenario variation within run-to-run noise.

Was this patch authored or co-authored using generative AI tooling?

No.

devin-petersohn

LGTM

devin-petersohn approved these changes May 5, 2026

View reviewed changes

refactor: extract SQL_COGROUPED_MAP_PANDAS_UDF into read_udfs

9960e00

Yicong-Huang force-pushed the refactor/cogrouped-map-pandas-udf branch from 87d96eb to 9960e00 Compare May 20, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56718][PYTHON] Refactor SQL_COGROUPED_MAP_PANDAS_UDF#55674

[SPARK-56718][PYTHON] Refactor SQL_COGROUPED_MAP_PANDAS_UDF#55674
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:refactor/cogrouped-map-pandas-udf

Yicong-Huang commented May 4, 2026

Uh oh!

devin-petersohn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yicong-Huang commented May 4, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

devin-petersohn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants