Implement `DataFrame.count(axis=1)` on the GPU by galipremsagar · Pull Request #23016 · rapidsai/cudf

galipremsagar · 2026-06-27T00:37:41Z

Description

DataFrame.count(axis=1) previously raised NotImplementedError ("Only axis=0 is currently supported"). Under cudf.pandas this triggered a CPU fallback that copies the entire DataFrame from device to host before running the reduction in pandas — very expensive for wide / string-heavy frames.

This PR implements count(axis=1) directly on the GPU as the row-wise sum of each column's validity (Σ col.notnull()), returning an int64 Series indexed by the frame's row index. numeric_only is now also honored for both axes (it was previously silently ignored).

Behavior

Matches pandas across numpy, pandas nullable (Int64 / boolean / Float64 / StringDtype), Arrow-backed (*[pyarrow]), and mixed-dtype frames, including NaN / <NA> / NaT (counted as missing) and numeric_only=True.
Result dtype is int64, matching pandas.

Benchmark

NYC parking violations 2022 (~15.4M rows; 7 string + 3 int + 1 datetime columns), df.count(axis=1):

	time	speedup
`cudf.pandas` before (CPU fallback)	3.93 s	1×
pure pandas	1.40 s	—
`cudf.pandas` after (this PR, on GPU)	0.023 s	~170× vs fallback, ~60× vs pandas

The whole-frame device→host copy is eliminated — the reduction now stays on the GPU.

Reproducer

import pandas as pd, time
# run with:  python -m cudf.pandas bench.py
# data:      https://data.rapids.ai/datasets/nyc_parking/nyc_parking_violations_2022.parquet
df = pd.read_parquet(
    "nyc_parking_violations_2022.parquet",
    columns=[
        "Registration State", "Violation Code", "Vehicle Body Type", "Vehicle Make",
        "Violation Time", "Violation County", "Vehicle Year", "Violation Description",
        "Issue Date", "Summons Number",
    ],
)
df["Issue Date"] = df["Issue Date"].astype("datetime64[ms]")
df["issue_weekday"] = df["Issue Date"].dt.weekday

df.count(axis=1)  # warm
t = time.time(); df.count(axis=1); print(f"{time.time() - t:.4f} s")

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-06-27T00:37:45Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-27T00:45:32Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- DataFrame.count() now supports row-wise counting with axis=1.
- Added support for numeric_only when counting, matching expected results on numeric-only data.
Bug Fixes
- count(axis=1) no longer raises an error and now returns per-row non-null counts.
- Improved behavior for empty dataframes with no columns.
Tests
- Added coverage to validate count(axis=1) against pandas across mixed data and missing values.

Walkthrough

DataFrame.count now supports axis=1 row-wise counts, applies numeric_only column filtering, updates the docstring example, and adds tests that compare the new behavior with pandas.

Changes

Row-wise count behavior

Layer / File(s)	Summary
Count implementation and docs `python/cudf/cudf/core/dataframe.py`	`DataFrame.count` now handles `axis=1`, selects numeric columns when `numeric_only=True`, sums non-null masks across columns, handles empty-column frames, and updates the docstring example.
Row-wise count validation `python/cudf/cudf/tests/dataframe/methods/test_reductions.py`	The axis=1 unsupported-op list no longer includes `count`, and new parametrized cases compare `count(axis=1, numeric_only=...)` against pandas across mixed inputs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding DataFrame.count(axis=1) support on the GPU.
Description check	✅ Passed	The description accurately matches the implemented row-wise count support, numeric_only behavior, and benchmark motivation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

python/cudf/cudf/tests/dataframe/methods/test_reductions.py (1)
252-268: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Extend count(axis=1) coverage to the dtypes the PR claims to support.

The PR states count(axis=1) matches pandas across datetime NaT, pandas nullable, and Arrow-backed dtypes, but the parametrization only exercises numpy columns with None/np.nan. Consider adding cases for a datetime column containing NaT, a pandas nullable/Arrow-backed column with <NA>, and an empty (0-row) frame to guard the row-wise path and the numeric_only filtering against regressions.

As per path instructions: "Ensure test files provide comprehensive edge case coverage (empty, all-null, single-element, mixed types)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudf/cudf/tests/dataframe/methods/test_reductions.py` around lines 252
- 268, Extend test_dataframe_count_axis1 to cover the dtypes and edge cases
claimed by the PR: add parametrized inputs for a datetime column with NaT, a
pandas nullable/Arrow-backed column with <NA>, and an empty 0-row DataFrame.
Update the expected comparisons against pandas DataFrame.count(axis=1,
numeric_only=...) so the row-wise path and numeric_only filtering are validated
for these cases without changing the existing assert_eq flow.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@python/cudf/cudf/tests/dataframe/methods/test_reductions.py`:
- Around line 252-268: Extend test_dataframe_count_axis1 to cover the dtypes and
edge cases claimed by the PR: add parametrized inputs for a datetime column with
NaT, a pandas nullable/Arrow-backed column with <NA>, and an empty 0-row
DataFrame. Update the expected comparisons against pandas
DataFrame.count(axis=1, numeric_only=...) so the row-wise path and numeric_only
filtering are validated for these cases without changing the existing assert_eq
flow.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: abfa9332-f852-4f8b-ad9c-9d5619f9eaee

📥 Commits

Reviewing files that changed from the base of the PR and between c979f58 and 1099327.

📒 Files selected for processing (2)

python/cudf/cudf/core/dataframe.py
python/cudf/cudf/tests/dataframe/methods/test_reductions.py

Implement count=1

1099327

github-actions Bot assigned galipremsagar Jun 27, 2026

github-actions Bot added the Python Affects Python cuDF API. label Jun 27, 2026

github-project-automation Bot added this to cuDF Python Jun 27, 2026

GPUtester moved this to In Progress in cuDF Python Jun 27, 2026

galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Jun 27, 2026

galipremsagar changed the title ~~Implement count=1~~ Implement DataFrame.count(axis=1) on the GPU Jun 27, 2026

galipremsagar marked this pull request as ready for review June 27, 2026 00:41

galipremsagar requested a review from a team as a code owner June 27, 2026 00:41

galipremsagar requested review from Matt711 and bdice June 27, 2026 00:41

galipremsagar added the 3 - Ready for Review Ready for review by team label Jun 27, 2026

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `DataFrame.count(axis=1)` on the GPU#23016

Implement `DataFrame.count(axis=1)` on the GPU#23016
galipremsagar wants to merge 1 commit into
rapidsai:mainfrom
galipremsagar:count

galipremsagar commented Jun 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

galipremsagar commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Behavior

Benchmark

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galipremsagar commented Jun 27, 2026 •

edited

Loading