Skip to content

[SPARK-55791][PYTHON] Fix pandas-on-Spark equality comparisons under ANSI mode#55987

Open
fuxi611 wants to merge 1 commit into
apache:masterfrom
fuxi611:SPARK-55791-ansi-pandas-behavior
Open

[SPARK-55791][PYTHON] Fix pandas-on-Spark equality comparisons under ANSI mode#55987
fuxi611 wants to merge 1 commit into
apache:masterfrom
fuxi611:SPARK-55791-ansi-pandas-behavior

Conversation

@fuxi611
Copy link
Copy Markdown

@fuxi611 fuxi611 commented May 19, 2026

What changes were proposed in this pull request?

This PR fixes pandas-on-Spark equality and inequality comparisons between incompatible dtypes under ANSI mode.

The change makes pandas-on-Spark return pandas-compatible boolean results for incompatible dtype comparisons instead of delegating them to Spark SQL casting behavior:

  • eq returns all False
  • ne returns all True

This covers comparisons such as numeric Series/Index against string Series/Index or string scalar values.

Why are the changes needed?

ANSI mode should not change pandas API on Spark behavior. Without this fix, Spark SQL may try to cast incompatible operands under ANSI mode, which can produce behavior that differs from pandas or raise errors for comparisons where pandas would simply return boolean results.

Does this PR introduce any user-facing change?

Yes. pandas-on-Spark comparison behavior becomes more consistent with pandas under ANSI mode for incompatible dtype equality and inequality comparisons.

How was this patch tested?

Ran:

python3 python/run-tests.py --testnames pyspark.pandas.tests.data_type_ops.test_num_ops
python3 python/run-tests.py --testnames pyspark.pandas.tests.data_type_ops.test_boolean_ops

…ison behavior

Co-authored-by: Le Nguyen Gia Bao <22120023@student.hcmus.edu.vn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant