Note: This issue was drafted by an AI agent (Claude Code) during a profiling/investigation session with @FBumann. The measurements (memray C-level profiling, benchmarks, the 284× / 132 MB figures) and the git-history archaeology are real and reproducible, but the framing and proposed directions are a starting point for discussion — please sanity-check before acting on them.
LinearExpression.groupby(...).sum() pads every group to the largest group size along _term (via unstack(group_dim, fill_value=...) in expressions.py). For skewed groups — e.g. incidence/balance constraints where a few buses host most units — the result blows up: a 300k-element expression over zipf-distributed groups produces a ~12 GiB padded result that is >99% fill.
Two separable costs:
Real-world evidence: PyPSA already works around this
PyPSA splits the nodal-balance constraint into two separate constraints — strongly-meshed vs weakly-meshed buses — purely to contain this padding (pypsa/optimization/optimize.py, with its own comment "This reduces memory usage for large networks"):
meshed_buses = get_strongly_meshed_buses(n, threshold=45) # buses with >45 attached components
define_nodal_balance_constraints(n, sns, buses=weakly_meshed_buses) # small groups
define_nodal_balance_constraints(n, sns, buses=meshed_buses, suffix="-meshed") # large groups
A single grouped balance over all buses would pad every bus to the largest hub's term count; bucketing by meshedness keeps each rectangle padded only to its own bucket max. This is the "eventually do a separation of short and long linear expressions" noted in the original groupby commit (PyPSA #557, 2023) — and it is actively maintained: PyPSA #1591 (2026) promoted meshed_thresholds to a tunable user parameter.
Note: on typical PyPSA networks the realistic group skew is small (max ≈ 8 generators/bus, ~2.7× padding), so groupby is not the build's peak allocation there — merge is (#749). The pathological skew matters for detailed unit-commitment / rooftop-PV aggregation and for the meshed hubs the split above exists to handle.
Related
Filing as a tracking note so this doesn't get lost.
LinearExpression.groupby(...).sum()pads every group to the largest group size along_term(viaunstack(group_dim, fill_value=...)inexpressions.py). For skewed groups — e.g. incidence/balance constraints where a few buses host most units — the result blows up: a 300k-element expression over zipf-distributed groups produces a ~12 GiB padded result that is >99% fill.Two separable costs:
_termlayout; only a long-format/sparse kernel removes it. See Umbrella: long-format / sparse_termkernel (dense-_termmemory cluster) #756.Real-world evidence: PyPSA already works around this
PyPSA splits the nodal-balance constraint into two separate constraints — strongly-meshed vs weakly-meshed buses — purely to contain this padding (
pypsa/optimization/optimize.py, with its own comment "This reduces memory usage for large networks"):A single grouped balance over all buses would pad every bus to the largest hub's term count; bucketing by meshedness keeps each rectangle padded only to its own bucket max. This is the "eventually do a separation of short and long linear expressions" noted in the original groupby commit (PyPSA #557, 2023) — and it is actively maintained: PyPSA #1591 (2026) promoted
meshed_thresholdsto a tunable user parameter.Note: on typical PyPSA networks the realistic group skew is small (max ≈ 8 generators/bus, ~2.7× padding), so
groupbyis not the build's peak allocation there —mergeis (#749). The pathological skew matters for detailed unit-commitment / rooftop-PV aggregation and for the meshed hubs the split above exists to handle.Related
@/dotagainst a sparse matrix densifies the result to full_term#748 —@/dotagainst a sparse matrix densifies the same way (KVL).mergeof ragged expressions is the peak allocation in PyPSA model builds #749 —mergeof ragged expressions is the actual build peak._termkernel (dense-_termmemory cluster) #756 — long-format/sparse_termkernel that subsumes all three.Filing as a tracking note so this doesn't get lost.