perf: scatter groupby-sum terms directly instead of unstacking by FBumann · Pull Request #25 · FBumann/linopy

FBumann · 2026-06-03T12:46:25Z

What

The fast path of LinearExpression.groupby(...).sum() used ds.unstack(group_dim, fill_value=...) followed by a stack, which materializes 2–3 intermediate copies of the padded result (n_groups × max_group_size × nterm) and runs pandas MultiIndex machinery sized by the number of elements.

This PR replaces it with a scatter kernel: factorize the groups, compute within-group positions, and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with np.add.at. Peak memory drops to input + result — the minimum possible for the padded layout — and grouping gets much faster.

Benchmarks

bench.time / bench.memory (PR PyPSA#735 tooling) on expr.groupby(groups).sum(), zipf-skewed groups (incidence/balance-constraint pattern):

case	time before → after	peak RAM before → after
uniform 100k → 5k groups	6.6 ms → 3.8 ms	9.7 → 5.7 MiB
skewed 100k → 5k groups	72 ms → 18 ms (4×)	2,182 → 833 MiB (2.6×)
skewed 300k → 10k groups	2.86 s → 90 ms (32×)	12,254 → 4,674 MiB (2.6×)

The remaining peak is exactly the padded result itself — that part is inherent to the dense rectangular _term representation, not to this implementation.

Robustness

xarray owns the result structure: dims/coords are assembled with drop_dims / expand_dims / assign; numpy only fills values.
Conservative dispatch (_can_sum_by_scatter): chunked (dask) data or exotic coordinates on the grouped dim fall back to the previous unstack implementation, which is kept as _sum_by_unstack.
Kernel-lock test: test_linear_expression_groupby_scatter_equals_unstack asserts both kernels stay bit-exactly interchangeable (dims, coords, vars, coeffs, padding positions) across 7 edge cases — an xarray update that shifts unstack behavior fails CI instead of silently diverging.
Behavioral parity: result dims, coords, term ordering and padding are unchanged; NaN group labels now raise an informative ValueError instead of a cryptic unstack error.
PyPSA end-to-end validation: AC-DC-meshed and SciGRID-DE networks solve to identical objectives with identical variable/constraint counts and byte-identical normalized constraint matrices vs master.

Notes

Kept on the fork for now (not upstream).
Side-finding (pre-existing, unrelated): pypsa+linopy model builds are not run-to-run reproducible — term counts vary with PYTHONHASHSEED (set-ordering somewhere in the build path, linopy.common.get_index_map is a candidate). Worth a separate issue.

🤖 Generated with Claude Code

Summary by CodeRabbit

Performance
- Enhanced grouped sum operations with optimized computation for compatible data layouts.
Tests
- Expanded test coverage for grouped operations, including edge cases, chunked data scenarios, and error handling validation.

The fast path of LinearExpression.groupby(...).sum() used ds.unstack(group_dim, fill_value=...) followed by a stack, which materializes 2-3 intermediate copies of the padded result (n_groups x max_group_size x nterm) and goes through pandas MultiIndex machinery sized by the number of elements. Instead, factorize the groups and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with np.add.at. Peak memory drops to input + result (the minimum for the padded layout) and the grouping itself gets considerably faster. The result is unchanged: same dims, coords, term ordering and padding. The unstack-based implementation is kept as _sum_by_unstack and still used for chunked (dask-backed) data, which cannot be scattered into numpy arrays. NaN group labels now raise an informative ValueError instead of failing inside unstack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-03T12:46:38Z

📝 Walkthrough

Walkthrough

This PR optimizes LinearExpressionGroupby.sum() by introducing a conditional dispatch mechanism that routes to a new fast scatter-based kernel when data layout is compatible, otherwise falling back to the existing unstack-based implementation. The scatter kernel uses numpy factorization and direct array scattering for speed, while preserving the unstack path for complex or chunked data scenarios.

Changes

Groupby Sum Optimization

Layer / File(s)	Summary
Dispatch and eligibility logic `linopy/expressions.py`	`sum()` method now conditionally routes to scatter or unstack paths based on `_can_sum_by_scatter` eligibility check, which validates numpy-backed arrays and coordinate layout compatibility.
Scatter-based summation kernel `linopy/expressions.py`	`_sum_by_scatter` implements fast group aggregation by factorizing group labels, computing within-group positions, scattering `coeffs` and `vars` into padded arrays, and summing `const` per group with NaN-as-zero semantics.
Unstack-based fallback kernel `linopy/expressions.py`	`_sum_by_unstack` encapsulates the original cumcount/multiindex/unstack summation approach, supporting chunked/dask-backed data and complex coordinate setups via xarray operations.
Groupby sum test validation `test/test_linear_expression.py`	Comprehensive test suite validates skewed/unsorted group labels, chunked dask expressions, NaN error handling, and bit-exact equivalence between scatter and unstack implementations across multiple configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A scatter kernel hops with speed,
While unstack paths meet every need—
Fast numpy routes when data's aligned,
Else xarray's embrace, always kind.
Tests lock the dance, bit-exact and true!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description covers the changes, benchmarks, robustness measures, and validation, but the author-provided checklist is empty and some template sections are not fully addressed.	Complete the checklist by confirming docstrings/documentation are added, unit tests exist, and release notes are included; or clarify which items are not applicable.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main performance optimization: replacing unstack with a scatter kernel for groupby-sum operations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/groupby-sum-scatter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

test/test_linear_expression.py (1)

1655-1669: ⚡ Quick win

Assert the chunked dispatch gate explicitly.

This test checks result parity, but it doesn’t verify the dispatch contract itself. Add an assertion that chunked input is ineligible for scatter to prevent silent regressions in path selection.

Suggested patch

 def test_linear_expression_groupby_chunked(v: Variable) -> None:
     """Chunked (dask-backed) expressions group via xarray's unstack machinery."""
     pytest.importorskip("dask")
     expr = 2 * v + 5
     groups = pd.Series([1] * 12 + [2] * 8, index=v.indexes["dim_2"], name="group")

     chunked = LinearExpression(expr.data.chunk({"dim_2": 5}), expr.model)
-    grouped_chunked = chunked.groupby(groups).sum()
+    gb_chunked = chunked.groupby(groups)
+    assert not gb_chunked._can_sum_by_scatter(groups)
+    grouped_chunked = gb_chunked.sum()
     grouped = expr.groupby(groups).sum()

     assert grouped_chunked.nterm == grouped.nterm
     assert_linequal(
         LinearExpression(grouped_chunked.data.compute(), expr.model), grouped
     )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_linear_expression.py` around lines 1655 - 1669, Add an explicit
assertion that the chunked input is ineligible for scatter before calling
groupby: assert not LinearExpression._dispatch_scatter_eligible(chunked) (or the
project’s equivalent private dispatch gate) so the test not only compares
results but also verifies that chunked (the LinearExpression created from
expr.data.chunk(...)) does not take the scatter dispatch path; place this
assertion in test_linear_expression_groupby_chunked right after chunked is
created and before grouped_chunked is computed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/test_linear_expression.py`:
- Around line 1655-1669: Add an explicit assertion that the chunked input is
ineligible for scatter before calling groupby: assert not
LinearExpression._dispatch_scatter_eligible(chunked) (or the project’s
equivalent private dispatch gate) so the test not only compares results but also
verifies that chunked (the LinearExpression created from expr.data.chunk(...))
does not take the scatter dispatch path; place this assertion in
test_linear_expression_groupby_chunked right after chunked is created and before
grouped_chunked is computed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d68a1bcb-30df-4d7d-a5c9-c160fb51d460

📥 Commits

Reviewing files that changed from the base of the PR and between d834810 and 63c30f8.

📒 Files selected for processing (2)

linopy/expressions.py
test/test_linear_expression.py

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

FBumann mentioned this pull request Jun 3, 2026

groupby-sum pads every group to the largest group size PyPSA/linopy#745

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: scatter groupby-sum terms directly instead of unstacking#25

perf: scatter groupby-sum terms directly instead of unstacking#25
FBumann wants to merge 1 commit into
masterfrom
perf/groupby-sum-scatter

FBumann commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FBumann commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Benchmarks

Robustness

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FBumann commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading