You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue collects information about how to handle sparsity in linopy, while keeping the convenience of xarray broadcasting and backwards compat.
Note
Generated with AI
Root cause
A LinearExpression stores coeffs/vars over a dense coord_dims × _term rectangle. Every op that combines terms re-pads that axis — which is the peak allocation in real PyPSA builds. The same root cause surfaces at four entry points:
Context: #740 (polar-high benchmark — the motivation), #712/#713 (absent/NaN semantics a sparse store must keep), #744 (flat dims — prerequisite for a tabular core), #630 (CSRConstraint — existing compressed path, export-side only).
Direction
Two backends were weighed (full note below): sparse.COO under xarray (all-or-nothing; densifies at the first unsupported op) vs. a Polars long-format kernel (the form to_polars already emits). Recommendation: incremental long-format on the hot ops, behind the existing dense xarray API — convert back to dense _term only at the boundary, validate bit-for-bit, ship one op at a time:
A full tabular core or sparse.COO remain open later, but are larger bets (the core is gated on #744).
(Earlier siblings referred to "#740" as the kernel — that was a mis-reference to the polar-high benchmark; this issue is the kernel.)
Full investigation note
Sparsity support in linopy — investigation
Investigation note (Claude Code, prompted by @FBumann), 2026-06-04. Assesses
two candidate backends for sparse expression storage — sparse xarray
(sparse.COO duck-arrays) vs. a tabular/long-format (Polars) kernel — and
recommends an incremental path. Code anchors are against the docs/ai-contribution-guide working tree.
What the issue tracker already says
A tight, well-profiled cluster of open issues all trace to one root cause:
the dense _term rectangle in LinearExpression.
leaning toward dropping first-class MI — a prerequisite for a tabular core (see below)
There is already a partial sparse path: #630 / freeze_constraints=True → CSRConstraint (scipy CSR + label arrays, no dense Dataset; constraints.py:476). Measured at ~32% leaner constraint storage on a real
network. But it only engages at the export/storage boundary — it does not
touch the build phase where the peak actually lives.
The actual data model, and where density is forced
A LinearExpression is an xarray Dataset of three arrays over coord_dims × _term:
coeffs (float, fill 0), vars (int labels, fill -1 = absent), const (float, no _term).
The _term axis is a dense rectangle. Every operation that combines terms
re-rectangularizes:
groupby().sum() (expressions.py:262) → unstack(group_dim, fill_value=...) — pads to largest group.
merge (expressions.py:2476) → xr.concat with fill_value, aligning _term axes.
__matmul__ (expressions.py:1729) → literally (self * other).sum(common_dims) — never inspects whether other is zero.
One subtlety: even to_polars densifies first — to_polars
(common.py:194) does broadcast(ds)[0] then .values.reshape(-1)before filter_nulls_polars. So today's "tabular" output is a dense→long dump at the
very end; it does not relieve the build-phase peak (consistent with #749
placing the peak at merge, upstream of export). The sparse story must be
fixed in the build ops, not at export.
Backend A — sparse xarray (sparse.COO duck-arrays)
Idea: back coeffs/vars with pydata sparse.COO instead of numpy;
zero-coeff terms cost nothing, the public xarray API stays.
Attractive because: minimal API change — LinearExpression stays an xarray Dataset; in principle every op "just works" and all four issues dissolve at
once (#740's last comment makes exactly this point).
Riskier than it looks:
The ops linopy leans on hardest — stack/unstack (in _sum and
groupby-sum), reindex-with-fill and concat-with-fill (in merge), groupby — are precisely the weakest-supported operations on sparse.COO
under xarray. unstack on COO is notoriously unsupported/slow. You densify
at the first gap and get no win while paying COO overhead everywhere else.
Two different fill values.coeffs is sparse-around-0; vars is
sparse-around--1. COO supports a non-zero fill_value, but the two
arrays must stay co-indexed through every broadcast/align, and xarray
broadcasting across two COO arrays with differing fill values does not
compose cleanly.
Everything numpy-shaped breaks: .flat/.values, np.add.at in matrices.py, scipy interop, repr, simplify's bincount. Each needs a
COO branch.
Verdict: elegant on paper, invasive and fragile in reality. All-or-nothing —
the payoff requires every hot op to have a real COO path.
Backend B — tabular / long-format (Polars / COO triplets)
Idea: represent the term-carrying part of an expression as a long relation
— (…coord cols…, coeffs, vars) — i.e. exactly the COO/tidy form to_polars
already emits, and exactly polar-high's architecture (#740).
In this representation the pathological ops become cheap and natural:
groupby(k).sum() → group_by(k).agg — no padding, output is observed groups only.
merge → vertical concat — no _term alignment, no global-max padding.
sum(dim) → drop the dim from the key — no restack.
Attractive because: Polars is already a required dependency
(pyproject.toml:38) and already the IO kernel (to_polars on every object,
the LP writer streams Polars). The seam exists.
Limitations:
It's a second data model beside xarray. Two honest sub-options:
(B1) Op-local long kernels with dense↔long round-trips at the edges.
Cheap to land incrementally, but a round-trip only pays off if the op stays
long end-to-end; isolated conversions can cost more than they save.
(B2) Long-format as the primary store — effectively the polar-high
rewrite. You'd re-implement xarray's label-alignment, broadcasting, .sel/.where/.shift, masks, coord bookkeeping, and printing on a
columnar table. A large fraction of expressions.py / variables.py / alignment.py is exactly that machinery.
Polars has no n-D labelled array: dims become columns, and broadcasting a (snapshot) constant against (snapshot, branch) becomes an explicit join —
works, but you're reimplementing xarray.
Don't replace xarray, and don't go COO-under-xarray. Go incremental
long-format on the hot ops (B1), behind the existing dense API — the "only the
critical operations" instinct. In order of measured peak:
groupby().sum() (groupby-sum pads every group to the largest group size #745) — two separable wins: the transient
(FBumann/linopy#25 scatter-instead-of-unstack prototype: ~32× faster, fits
the current dense model as a drop-in) and the inherent padding (needs long
output). Do the transient first.
Each is independently shippable, validated bit-for-bit against the dense path,
and each stays long end-to-end if it feeds the next op or to_polars / the
matrix export (where it would also let you skip the densifying broadcast in to_polars). The CSRConstraint path (#630) is the proof that a compressed
representation can coexist with the dense one — extend that pattern leftward
into expression building.
Backend A and the full tabular core (B2) are both "one architecture fixes
everything" stories, but they are large, all-or-nothing bets. The incremental
long-format route recovers most of the measured memory op by op, each step
verifiable — and leaves the door open to B2 later (gated on #744 landing as
flat-dim).
Tracking
This issue is that long-format/sparse _term kernel tracker — linking #748 / #745 / #749 (+ #757) and the incremental plan above. (Earlier siblings pointed
at "#740" for the kernel; #740 is actually the polar-high benchmark.)
This issue collects information about how to handle sparsity in linopy, while keeping the convenience of xarray broadcasting and backwards compat.
Note
Generated with AI
Root cause
A
LinearExpressionstorescoeffs/varsover a densecoord_dims × _termrectangle. Every op that combines terms re-pads that axis — which is the peak allocation in real PyPSA builds. The same root cause surfaces at four entry points:@/dotagainst a sparse matrix densifies the result to full_term#748 —@/dotvs a sparse matrixgroupby().sum()pads to the largest groupmergeof ragged expressions is the peak allocation in PyPSA model builds #749 —mergeof ragged expressions (the build peak)groupby([names])densifies to a cartesian grid (the fast-path half, Make multi-key LinearExpression.groupby([names]) fast and flat (align with xarray's grouper API) #753, shipped in fix(groupby): group by non-dimension coordinate names; fast multi-key grouping by names (#750, #753) #751)Context: #740 (polar-high benchmark — the motivation), #712/#713 (absent/NaN semantics a sparse store must keep), #744 (flat dims — prerequisite for a tabular core), #630 (
CSRConstraint— existing compressed path, export-side only).Direction
Two backends were weighed (full note below):
sparse.COOunder xarray (all-or-nothing; densifies at the first unsupported op) vs. a Polars long-format kernel (the formto_polarsalready emits). Recommendation: incremental long-format on the hot ops, behind the existing dense xarray API — convert back to dense_termonly at the boundary, validate bit-for-bit, ship one op at a time:__matmul__/dot(@/dotagainst a sparse matrix densifies the result to full_term#748) — most self-contained, biggest single wingroupby().sum()(groupby-sum pads every group to the largest group size #745) — transient win first (FBumann/linopy#25), then the inherent paddingmerge(mergeof ragged expressions is the peak allocation in PyPSA model builds #749) — concat long, rectangularize onceA full tabular core or
sparse.COOremain open later, but are larger bets (the core is gated on #744).(Earlier siblings referred to "#740" as the kernel — that was a mis-reference to the polar-high benchmark; this issue is the kernel.)
Full investigation note
Sparsity support in linopy — investigation
What the issue tracker already says
A tight, well-profiled cluster of open issues all trace to one root cause:
the dense
_termrectangle inLinearExpression.@/dotvs sparse matrix_termvs 3 needed → 284× cells, the single largest alloc (132 MB) in a SciGRID-DE buildgroupby().sum()mergeof ragged exprs_termThere is already a partial sparse path: #630 /
freeze_constraints=True→CSRConstraint(scipy CSR + label arrays, no denseDataset;constraints.py:476). Measured at ~32% leaner constraint storage on a realnetwork. But it only engages at the export/storage boundary — it does not
touch the build phase where the peak actually lives.
The actual data model, and where density is forced
A
LinearExpressionis an xarrayDatasetof three arrays overcoord_dims × _term:coeffs(float, fill0),vars(int labels, fill-1= absent),const(float, no_term).The
_termaxis is a dense rectangle. Every operation that combines termsre-rectangularizes:
_sum(dim)(expressions.py:1488) →stack({_term: [_term]+dim})— dense product.groupby().sum()(expressions.py:262) →unstack(group_dim, fill_value=...)— pads to largest group.merge(expressions.py:2476) →xr.concatwithfill_value, aligning_termaxes.__matmul__(expressions.py:1729) → literally(self * other).sum(common_dims)— never inspects whetherotheris zero.One subtlety: even
to_polarsdensifies first —to_polars(
common.py:194) doesbroadcast(ds)[0]then.values.reshape(-1)beforefilter_nulls_polars. So today's "tabular" output is a dense→long dump at thevery end; it does not relieve the build-phase peak (consistent with #749
placing the peak at
merge, upstream of export). The sparse story must befixed in the build ops, not at export.
Backend A — sparse xarray (
sparse.COOduck-arrays)Idea: back
coeffs/varswith pydatasparse.COOinstead of numpy;zero-coeff terms cost nothing, the public xarray API stays.
Attractive because: minimal API change —
LinearExpressionstays an xarrayDataset; in principle every op "just works" and all four issues dissolve atonce (#740's last comment makes exactly this point).
Riskier than it looks:
stack/unstack(in_sumandgroupby-sum),
reindex-with-fill andconcat-with-fill (inmerge),groupby— are precisely the weakest-supported operations onsparse.COOunder xarray.
unstackon COO is notoriously unsupported/slow. You densifyat the first gap and get no win while paying COO overhead everywhere else.
coeffsis sparse-around-0;varsissparse-around-
-1. COO supports a non-zerofill_value, but the twoarrays must stay co-indexed through every broadcast/align, and xarray
broadcasting across two COO arrays with differing fill values does not
compose cleanly.
.flat/.values,np.add.atinmatrices.py, scipy interop,repr,simplify'sbincount. Each needs aCOO branch.
structural zero in COO, the opposite of what An absent variable is indistinguishable from a zero variable #712 wants (absent must stay
distinguishable from a real zero).
Verdict: elegant on paper, invasive and fragile in reality. All-or-nothing —
the payoff requires every hot op to have a real COO path.
Backend B — tabular / long-format (Polars / COO triplets)
Idea: represent the term-carrying part of an expression as a long relation
—
(…coord cols…, coeffs, vars)— i.e. exactly the COO/tidy formto_polarsalready emits, and exactly polar-high's architecture (#740).
In this representation the pathological ops become cheap and natural:
groupby(k).sum()→group_by(k).agg— no padding, output is observed groups only.expr @ sparse→ a join on the contracted dim over nonzeros only —@/dotagainst a sparse matrix densifies the result to full_term#748's bit-equivalent prototype is literally "explodeCto(branch, cycle, value)thengroupby(cycle).sum()".merge→ verticalconcat— no_termalignment, no global-max padding.sum(dim)→ drop the dim from the key — no restack.Attractive because: Polars is already a required dependency
(
pyproject.toml:38) and already the IO kernel (to_polarson every object,the LP writer streams Polars). The seam exists.
Limitations:
Cheap to land incrementally, but a round-trip only pays off if the op stays
long end-to-end; isolated conversions can cost more than they save.
rewrite. You'd re-implement xarray's label-alignment, broadcasting,
.sel/.where/.shift, masks, coord bookkeeping, and printing on acolumnar table. A large fraction of
expressions.py/variables.py/alignment.pyis exactly that machinery.(snapshot)constant against(snapshot, branch)becomes an explicit join —works, but you're reimplementing xarray.
table; first-class
pd.MultiIndexdoes not. The Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744 thread is alreadyleaning toward dropping MI, which conveniently de-risks an eventual tabular
core.
Feasibility & recommendation
Don't replace xarray, and don't go COO-under-xarray. Go incremental
long-format on the hot ops (B1), behind the existing dense API — the "only the
critical operations" instinct. In order of measured peak:
__matmul__/dotagainst a constant/sparse operand (@/dotagainst a sparse matrix densifies the result to full_term#748) — mostself-contained, biggest single win. Detect zeros in
other, build the COOproduct, group-sum the contracted dim in long form, return a compact-
_termexpression. No public API change;
@/dotagainst a sparse matrix densifies the result to full_term#748 already has a bit-equivalent reference.groupby().sum()(groupby-sum pads every group to the largest group size #745) — two separable wins: the transient(
FBumann/linopy#25scatter-instead-of-unstack prototype: ~32× faster, fitsthe current dense model as a drop-in) and the inherent padding (needs long
output). Do the transient first.
mergealong a non-_termdim (mergeof ragged expressions is the peak allocation in PyPSA model builds #749) — concat in long form,rectangularize once to per-result max instead of per-input-then-global.
Each is independently shippable, validated bit-for-bit against the dense path,
and each stays long end-to-end if it feeds the next op or
to_polars/ thematrix export (where it would also let you skip the densifying
broadcastinto_polars). TheCSRConstraintpath (#630) is the proof that a compressedrepresentation can coexist with the dense one — extend that pattern leftward
into expression building.
Backend A and the full tabular core (B2) are both "one architecture fixes
everything" stories, but they are large, all-or-nothing bets. The incremental
long-format route recovers most of the measured memory op by op, each step
verifiable — and leaves the door open to B2 later (gated on #744 landing as
flat-dim).
Tracking
This issue is that long-format/sparse
_termkernel tracker — linking #748 /#745 / #749 (+ #757) and the incremental plan above. (Earlier siblings pointed
at "#740" for the kernel; #740 is actually the polar-high benchmark.)