Skip to content

perf: scatter groupby-sum terms directly instead of unstacking#25

Open
FBumann wants to merge 1 commit into
masterfrom
perf/groupby-sum-scatter
Open

perf: scatter groupby-sum terms directly instead of unstacking#25
FBumann wants to merge 1 commit into
masterfrom
perf/groupby-sum-scatter

Conversation

@FBumann
Copy link
Copy Markdown
Owner

@FBumann FBumann commented Jun 3, 2026

What

The fast path of LinearExpression.groupby(...).sum() used ds.unstack(group_dim, fill_value=...) followed by a stack, which materializes 2–3 intermediate copies of the padded result (n_groups × max_group_size × nterm) and runs pandas MultiIndex machinery sized by the number of elements.

This PR replaces it with a scatter kernel: factorize the groups, compute within-group positions, and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with np.add.at. Peak memory drops to input + result — the minimum possible for the padded layout — and grouping gets much faster.

Benchmarks

bench.time / bench.memory (PR PyPSA#735 tooling) on expr.groupby(groups).sum(), zipf-skewed groups (incidence/balance-constraint pattern):

case time before → after peak RAM before → after
uniform 100k → 5k groups 6.6 ms → 3.8 ms 9.7 → 5.7 MiB
skewed 100k → 5k groups 72 ms → 18 ms (4×) 2,182 → 833 MiB (2.6×)
skewed 300k → 10k groups 2.86 s → 90 ms (32×) 12,254 → 4,674 MiB (2.6×)

The remaining peak is exactly the padded result itself — that part is inherent to the dense rectangular _term representation, not to this implementation.

Robustness

  • xarray owns the result structure: dims/coords are assembled with drop_dims / expand_dims / assign; numpy only fills values.
  • Conservative dispatch (_can_sum_by_scatter): chunked (dask) data or exotic coordinates on the grouped dim fall back to the previous unstack implementation, which is kept as _sum_by_unstack.
  • Kernel-lock test: test_linear_expression_groupby_scatter_equals_unstack asserts both kernels stay bit-exactly interchangeable (dims, coords, vars, coeffs, padding positions) across 7 edge cases — an xarray update that shifts unstack behavior fails CI instead of silently diverging.
  • Behavioral parity: result dims, coords, term ordering and padding are unchanged; NaN group labels now raise an informative ValueError instead of a cryptic unstack error.
  • PyPSA end-to-end validation: AC-DC-meshed and SciGRID-DE networks solve to identical objectives with identical variable/constraint counts and byte-identical normalized constraint matrices vs master.

Notes

  • Kept on the fork for now (not upstream).
  • Side-finding (pre-existing, unrelated): pypsa+linopy model builds are not run-to-run reproducible — term counts vary with PYTHONHASHSEED (set-ordering somewhere in the build path, linopy.common.get_index_map is a candidate). Worth a separate issue.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Performance
    • Enhanced grouped sum operations with optimized computation for compatible data layouts.
  • Tests
    • Expanded test coverage for grouped operations, including edge cases, chunked data scenarios, and error handling validation.

The fast path of LinearExpression.groupby(...).sum() used
ds.unstack(group_dim, fill_value=...) followed by a stack, which
materializes 2-3 intermediate copies of the padded result
(n_groups x max_group_size x nterm) and goes through pandas MultiIndex
machinery sized by the number of elements.

Instead, factorize the groups and scatter coeffs/vars directly into the
preallocated padded result arrays; constants are group-summed with
np.add.at. Peak memory drops to input + result (the minimum for the
padded layout) and the grouping itself gets considerably faster.

The result is unchanged: same dims, coords, term ordering and padding.
The unstack-based implementation is kept as _sum_by_unstack and still
used for chunked (dask-backed) data, which cannot be scattered into
numpy arrays. NaN group labels now raise an informative ValueError
instead of failing inside unstack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR optimizes LinearExpressionGroupby.sum() by introducing a conditional dispatch mechanism that routes to a new fast scatter-based kernel when data layout is compatible, otherwise falling back to the existing unstack-based implementation. The scatter kernel uses numpy factorization and direct array scattering for speed, while preserving the unstack path for complex or chunked data scenarios.

Changes

Groupby Sum Optimization

Layer / File(s) Summary
Dispatch and eligibility logic
linopy/expressions.py
sum() method now conditionally routes to scatter or unstack paths based on _can_sum_by_scatter eligibility check, which validates numpy-backed arrays and coordinate layout compatibility.
Scatter-based summation kernel
linopy/expressions.py
_sum_by_scatter implements fast group aggregation by factorizing group labels, computing within-group positions, scattering coeffs and vars into padded arrays, and summing const per group with NaN-as-zero semantics.
Unstack-based fallback kernel
linopy/expressions.py
_sum_by_unstack encapsulates the original cumcount/multiindex/unstack summation approach, supporting chunked/dask-backed data and complex coordinate setups via xarray operations.
Groupby sum test validation
test/test_linear_expression.py
Comprehensive test suite validates skewed/unsorted group labels, chunked dask expressions, NaN error handling, and bit-exact equivalence between scatter and unstack implementations across multiple configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A scatter kernel hops with speed,
While unstack paths meet every need—
Fast numpy routes when data's aligned,
Else xarray's embrace, always kind.
Tests lock the dance, bit-exact and true!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers the changes, benchmarks, robustness measures, and validation, but the author-provided checklist is empty and some template sections are not fully addressed. Complete the checklist by confirming docstrings/documentation are added, unit tests exist, and release notes are included; or clarify which items are not applicable.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main performance optimization: replacing unstack with a scatter kernel for groupby-sum operations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/groupby-sum-scatter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/test_linear_expression.py (1)

1655-1669: ⚡ Quick win

Assert the chunked dispatch gate explicitly.

This test checks result parity, but it doesn’t verify the dispatch contract itself. Add an assertion that chunked input is ineligible for scatter to prevent silent regressions in path selection.

Suggested patch
 def test_linear_expression_groupby_chunked(v: Variable) -> None:
     """Chunked (dask-backed) expressions group via xarray's unstack machinery."""
     pytest.importorskip("dask")
     expr = 2 * v + 5
     groups = pd.Series([1] * 12 + [2] * 8, index=v.indexes["dim_2"], name="group")

     chunked = LinearExpression(expr.data.chunk({"dim_2": 5}), expr.model)
-    grouped_chunked = chunked.groupby(groups).sum()
+    gb_chunked = chunked.groupby(groups)
+    assert not gb_chunked._can_sum_by_scatter(groups)
+    grouped_chunked = gb_chunked.sum()
     grouped = expr.groupby(groups).sum()

     assert grouped_chunked.nterm == grouped.nterm
     assert_linequal(
         LinearExpression(grouped_chunked.data.compute(), expr.model), grouped
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_linear_expression.py` around lines 1655 - 1669, Add an explicit
assertion that the chunked input is ineligible for scatter before calling
groupby: assert not LinearExpression._dispatch_scatter_eligible(chunked) (or the
project’s equivalent private dispatch gate) so the test not only compares
results but also verifies that chunked (the LinearExpression created from
expr.data.chunk(...)) does not take the scatter dispatch path; place this
assertion in test_linear_expression_groupby_chunked right after chunked is
created and before grouped_chunked is computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/test_linear_expression.py`:
- Around line 1655-1669: Add an explicit assertion that the chunked input is
ineligible for scatter before calling groupby: assert not
LinearExpression._dispatch_scatter_eligible(chunked) (or the project’s
equivalent private dispatch gate) so the test not only compares results but also
verifies that chunked (the LinearExpression created from expr.data.chunk(...))
does not take the scatter dispatch path; place this assertion in
test_linear_expression_groupby_chunked right after chunked is created and before
grouped_chunked is computed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d68a1bcb-30df-4d7d-a5c9-c160fb51d460

📥 Commits

Reviewing files that changed from the base of the PR and between d834810 and 63c30f8.

📒 Files selected for processing (2)
  • linopy/expressions.py
  • test/test_linear_expression.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant