[SPARK-55897][SQL][4.0] Handle UserDefinedType in ColumnarRow, ColumnarBatchRow, and ColumnarArray get() by james-willis · Pull Request #55990 · apache/spark

james-willis · 2026-05-19T16:58:56Z

Backport of #54701 to branch-4.0.

What changes were proposed in this pull request?

ColumnarRow.get(), ColumnarBatchRow.get(), and ColumnarArray.get() throw SparkUnsupportedOperationException when called with a UserDefinedType because they have no branch to handle UDTs.

This PR adds UDT handling to all three methods:

ColumnarRow and ColumnarBatchRow: Add an instanceof UserDefinedType branch that recurses with udt.sqlType(), matching the pattern already used in SpecializedGettersReader.read().
ColumnarArray: Change the handleUserDefinedType flag from false to true in the existing call to SpecializedGettersReader.read().

Why are the changes needed?

The codegen path (CodeGenerator.getValue()) unwraps udt.sqlType() before generating accessor calls, so UDT columns work when whole-stage codegen is active. However, on the interpreted eval path — when codegen is disabled, falls back, or the number of fields exceeds spark.sql.codegen.maxFields — GetStructField.nullSafeEval calls ColumnarRow.get(ordinal, udtType) directly, which hits the unhandled branch and throws.

Does this PR introduce any user-facing change?

Yes. UDT columns in columnar data sources (e.g., Parquet) now work correctly on the interpreted evaluation path. Previously they would throw SparkUnsupportedOperationException.

How was this patch tested?

Added 6 new tests in ColumnarBatchSuite covering all 3 methods x 2 UDT backing types (primitive IntegerType and complex StructType). Each test creates columnar vectors with UDT data and verifies that get() returns the correct value. Two helper UDT classes (TestIntUDT, TestStructWrapperUDT) are defined for the tests.

Cherry-picked from 472735c on master. The cherry-pick had a trivial conflict in ColumnarBatchSuite.scala: the neighboring [SPARK-55552] Variant test exists on branch-4.1+ but not on branch-4.0, so its insertion point was contested. Resolved by keeping only the SPARK-55897 tests (the Variant test is unrelated).

Was this patch authored or co-authored using generative AI tooling?

Yes. Opus 4.6

…chRow, and ColumnarArray get() ### What changes were proposed in this pull request? `ColumnarRow.get()`, `ColumnarBatchRow.get()`, and `ColumnarArray.get()` throw `SparkUnsupportedOperationException` when called with a `UserDefinedType` because they have no branch to handle UDTs. This PR adds UDT handling to all three methods: - **ColumnarRow** and **ColumnarBatchRow**: Add an `instanceof UserDefinedType` branch that recurses with `udt.sqlType()`, matching the pattern already used in `SpecializedGettersReader.read()`. - **ColumnarArray**: Change the `handleUserDefinedType` flag from `false` to `true` in the existing call to `SpecializedGettersReader.read()`. ### Why are the changes needed? The codegen path (`CodeGenerator.getValue()`) unwraps `udt.sqlType()` before generating accessor calls, so UDT columns work when whole-stage codegen is active. However, on the interpreted eval path — when codegen is disabled, falls back, or the number of fields exceeds `spark.sql.codegen.maxFields` — `GetStructField.nullSafeEval` calls `ColumnarRow.get(ordinal, udtType)` directly, which hits the unhandled branch and throws. ### Does this PR introduce _any_ user-facing change? Yes. UDT columns in columnar data sources (e.g., Parquet) now work correctly on the interpreted evaluation path. Previously they would throw `SparkUnsupportedOperationException`. ### How was this patch tested? Added 6 new tests in `ColumnarBatchSuite` covering all 3 methods × 2 UDT backing types (primitive `IntegerType` and complex `StructType`). Each test creates columnar vectors with UDT data and verifies that `get()` returns the correct value. Two helper UDT classes (`TestIntUDT`, `TestStructWrapperUDT`) are defined for the tests. ### Was this patch authored or co-authored using generative AI tooling? Yes. Opus 4.6 Closes apache#54701 from james-willis/columnar-row-udt-test. Authored-by: jameswillis <james@wherobots.com> Signed-off-by: Huaxin Gao <huaxin.gao11@gmail.com> (cherry picked from commit 472735c)

james-willis · 2026-05-19T17:00:41Z

@huaxingao here is the 4.0 port.

huaxingao

LGTM

huaxingao · 2026-05-19T23:14:16Z

@james-willis Could you check why the CI failed?

james-willis · 2026-05-20T18:31:42Z

@huaxingao It was that flakey Protobuf breaking change action. retry fixed.

huaxingao approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55897][SQL][4.0] Handle UserDefinedType in ColumnarRow, ColumnarBatchRow, and ColumnarArray get()#55990

[SPARK-55897][SQL][4.0] Handle UserDefinedType in ColumnarRow, ColumnarBatchRow, and ColumnarArray get()#55990
james-willis wants to merge 1 commit into
apache:branch-4.0from
james-willis:backport-SPARK-55897-4.0

james-willis commented May 19, 2026

Uh oh!

james-willis commented May 19, 2026

Uh oh!

huaxingao left a comment

Uh oh!

huaxingao commented May 19, 2026

Uh oh!

james-willis commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

james-willis commented May 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

james-willis commented May 19, 2026

Uh oh!

huaxingao left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented May 19, 2026

Uh oh!

james-willis commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants