perf: scatter groupby-sum terms directly instead of unstacking#25
perf: scatter groupby-sum terms directly instead of unstacking#25FBumann wants to merge 1 commit into
Conversation
The fast path of LinearExpression.groupby(...).sum() used ds.unstack(group_dim, fill_value=...) followed by a stack, which materializes 2-3 intermediate copies of the padded result (n_groups x max_group_size x nterm) and goes through pandas MultiIndex machinery sized by the number of elements. Instead, factorize the groups and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with np.add.at. Peak memory drops to input + result (the minimum for the padded layout) and the grouping itself gets considerably faster. The result is unchanged: same dims, coords, term ordering and padding. The unstack-based implementation is kept as _sum_by_unstack and still used for chunked (dask-backed) data, which cannot be scattered into numpy arrays. NaN group labels now raise an informative ValueError instead of failing inside unstack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR optimizes ChangesGroupby Sum Optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/test_linear_expression.py (1)
1655-1669: ⚡ Quick winAssert the chunked dispatch gate explicitly.
This test checks result parity, but it doesn’t verify the dispatch contract itself. Add an assertion that chunked input is ineligible for scatter to prevent silent regressions in path selection.
Suggested patch
def test_linear_expression_groupby_chunked(v: Variable) -> None: """Chunked (dask-backed) expressions group via xarray's unstack machinery.""" pytest.importorskip("dask") expr = 2 * v + 5 groups = pd.Series([1] * 12 + [2] * 8, index=v.indexes["dim_2"], name="group") chunked = LinearExpression(expr.data.chunk({"dim_2": 5}), expr.model) - grouped_chunked = chunked.groupby(groups).sum() + gb_chunked = chunked.groupby(groups) + assert not gb_chunked._can_sum_by_scatter(groups) + grouped_chunked = gb_chunked.sum() grouped = expr.groupby(groups).sum() assert grouped_chunked.nterm == grouped.nterm assert_linequal( LinearExpression(grouped_chunked.data.compute(), expr.model), grouped )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_linear_expression.py` around lines 1655 - 1669, Add an explicit assertion that the chunked input is ineligible for scatter before calling groupby: assert not LinearExpression._dispatch_scatter_eligible(chunked) (or the project’s equivalent private dispatch gate) so the test not only compares results but also verifies that chunked (the LinearExpression created from expr.data.chunk(...)) does not take the scatter dispatch path; place this assertion in test_linear_expression_groupby_chunked right after chunked is created and before grouped_chunked is computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/test_linear_expression.py`:
- Around line 1655-1669: Add an explicit assertion that the chunked input is
ineligible for scatter before calling groupby: assert not
LinearExpression._dispatch_scatter_eligible(chunked) (or the project’s
equivalent private dispatch gate) so the test not only compares results but also
verifies that chunked (the LinearExpression created from expr.data.chunk(...))
does not take the scatter dispatch path; place this assertion in
test_linear_expression_groupby_chunked right after chunked is created and before
grouped_chunked is computed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d68a1bcb-30df-4d7d-a5c9-c160fb51d460
📒 Files selected for processing (2)
linopy/expressions.pytest/test_linear_expression.py
What
The fast path of
LinearExpression.groupby(...).sum()usedds.unstack(group_dim, fill_value=...)followed by astack, which materializes 2–3 intermediate copies of the padded result (n_groups × max_group_size × nterm) and runs pandas MultiIndex machinery sized by the number of elements.This PR replaces it with a scatter kernel: factorize the groups, compute within-group positions, and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with
np.add.at. Peak memory drops to input + result — the minimum possible for the padded layout — and grouping gets much faster.Benchmarks
bench.time/bench.memory(PR PyPSA#735 tooling) onexpr.groupby(groups).sum(), zipf-skewed groups (incidence/balance-constraint pattern):The remaining peak is exactly the padded result itself — that part is inherent to the dense rectangular
_termrepresentation, not to this implementation.Robustness
drop_dims/expand_dims/assign; numpy only fills values._can_sum_by_scatter): chunked (dask) data or exotic coordinates on the grouped dim fall back to the previous unstack implementation, which is kept as_sum_by_unstack.test_linear_expression_groupby_scatter_equals_unstackasserts both kernels stay bit-exactly interchangeable (dims, coords, vars, coeffs, padding positions) across 7 edge cases — an xarray update that shifts unstack behavior fails CI instead of silently diverging.ValueErrorinstead of a cryptic unstack error.Notes
PYTHONHASHSEED(set-ordering somewhere in the build path,linopy.common.get_index_mapis a candidate). Worth a separate issue.🤖 Generated with Claude Code
Summary by CodeRabbit