Skip to content

Umbrella: long-format / sparse _term kernel (dense-_term memory cluster) #756

@FBumann

Description

@FBumann

This issue collects information about how to handle sparsity in linopy, while keeping the convenience of xarray broadcasting and backwards compat.

Note

Generated with AI

Root cause

A LinearExpression stores coeffs/vars over a dense coord_dims × _term rectangle. Every op that combines terms re-pads that axis — which is the peak allocation in real PyPSA builds. The same root cause surfaces at four entry points:

Context: #740 (polar-high benchmark — the motivation), #712/#713 (absent/NaN semantics a sparse store must keep), #744 (flat dims — prerequisite for a tabular core), #630 (CSRConstraint — existing compressed path, export-side only).

Direction

Two backends were weighed (full note below): sparse.COO under xarray (all-or-nothing; densifies at the first unsupported op) vs. a Polars long-format kernel (the form to_polars already emits). Recommendation: incremental long-format on the hot ops, behind the existing dense xarray API — convert back to dense _term only at the boundary, validate bit-for-bit, ship one op at a time:

  1. __matmul__/dot (@/dot against a sparse matrix densifies the result to full _term #748) — most self-contained, biggest single win
  2. groupby().sum() (groupby-sum pads every group to the largest group size #745) — transient win first (FBumann/linopy#25), then the inherent padding
  3. merge (merge of ragged expressions is the peak allocation in PyPSA model builds #749) — concat long, rectangularize once

A full tabular core or sparse.COO remain open later, but are larger bets (the core is gated on #744).

(Earlier siblings referred to "#740" as the kernel — that was a mis-reference to the polar-high benchmark; this issue is the kernel.)

Full investigation note

Sparsity support in linopy — investigation

Investigation note (Claude Code, prompted by @FBumann), 2026-06-04. Assesses
two candidate backends for sparse expression storage — sparse xarray
(sparse.COO duck-arrays) vs. a tabular/long-format (Polars) kernel — and
recommends an incremental path. Code anchors are against the
docs/ai-contribution-guide working tree.

What the issue tracker already says

A tight, well-profiled cluster of open issues all trace to one root cause:
the dense _term rectangle in LinearExpression.

Issue Op Symptom
#748 @ / dot vs sparse matrix KVL constraint: 852 _term vs 3 needed → 284× cells, the single largest alloc (132 MB) in a SciGRID-DE build
#745 groupby().sum() pads every group to the largest group → ~12 GiB on zipf groups; PyPSA works around it by splitting nodal balance into meshed / weakly-meshed buckets
#749 merge of ragged exprs the actual build peak — concat pads each block to global-max _term
#740 (polar-high benchmark) external Polars-backed competitor, ~2.3–3.5× faster build, ~30% less memory; community + maintainers interested
#712 / #713 absent vs zero, NaN the data-model semantics a sparse store would have to preserve
#744 MultiIndex storage leaning toward dropping first-class MI — a prerequisite for a tabular core (see below)

There is already a partial sparse path: #630 / freeze_constraints=True
CSRConstraint (scipy CSR + label arrays, no dense Dataset;
constraints.py:476). Measured at ~32% leaner constraint storage on a real
network. But it only engages at the export/storage boundary — it does not
touch the build phase where the peak actually lives.

The actual data model, and where density is forced

A LinearExpression is an xarray Dataset of three arrays over
coord_dims × _term:

  • coeffs (float, fill 0), vars (int labels, fill -1 = absent),
    const (float, no _term).

The _term axis is a dense rectangle. Every operation that combines terms
re-rectangularizes:

  • _sum(dim) (expressions.py:1488) → stack({_term: [_term]+dim}) — dense product.
  • groupby().sum() (expressions.py:262) → unstack(group_dim, fill_value=...) — pads to largest group.
  • merge (expressions.py:2476) → xr.concat with fill_value, aligning _term axes.
  • __matmul__ (expressions.py:1729) → literally (self * other).sum(common_dims) — never inspects whether other is zero.

One subtlety: even to_polars densifies firstto_polars
(common.py:194) does broadcast(ds)[0] then .values.reshape(-1) before
filter_nulls_polars. So today's "tabular" output is a dense→long dump at the
very end; it does not relieve the build-phase peak (consistent with #749
placing the peak at merge, upstream of export). The sparse story must be
fixed in the build ops, not at export.


Backend A — sparse xarray (sparse.COO duck-arrays)

Idea: back coeffs/vars with pydata sparse.COO instead of numpy;
zero-coeff terms cost nothing, the public xarray API stays.

Attractive because: minimal API change — LinearExpression stays an xarray
Dataset; in principle every op "just works" and all four issues dissolve at
once (#740's last comment makes exactly this point).

Riskier than it looks:

  • The ops linopy leans on hardest — stack/unstack (in _sum and
    groupby-sum), reindex-with-fill and concat-with-fill (in merge),
    groupby — are precisely the weakest-supported operations on sparse.COO
    under xarray
    . unstack on COO is notoriously unsupported/slow. You densify
    at the first gap and get no win while paying COO overhead everywhere else.
  • Two different fill values. coeffs is sparse-around-0; vars is
    sparse-around--1. COO supports a non-zero fill_value, but the two
    arrays must stay co-indexed through every broadcast/align, and xarray
    broadcasting across two COO arrays with differing fill values does not
    compose cleanly.
  • Everything numpy-shaped breaks: .flat/.values, np.add.at in
    matrices.py, scipy interop, repr, simplify's bincount. Each needs a
    COO branch.
  • It collides with the absent/NaN semantics work (An absent variable is indistinguishable from a zero variable #712/NaN in user-supplied data is silently filled instead of surfaced #713) — "absent" is a
    structural zero in COO, the opposite of what An absent variable is indistinguishable from a zero variable #712 wants (absent must stay
    distinguishable from a real zero).

Verdict: elegant on paper, invasive and fragile in reality. All-or-nothing —
the payoff requires every hot op to have a real COO path.


Backend B — tabular / long-format (Polars / COO triplets)

Idea: represent the term-carrying part of an expression as a long relation
(…coord cols…, coeffs, vars) — i.e. exactly the COO/tidy form to_polars
already emits, and exactly polar-high's architecture (#740).

In this representation the pathological ops become cheap and natural:

  • groupby(k).sum()group_by(k).aggno padding, output is observed groups only.
  • expr @ sparse → a join on the contracted dim over nonzeros only — @/dot against a sparse matrix densifies the result to full _term #748's bit-equivalent prototype is literally "explode C to (branch, cycle, value) then groupby(cycle).sum()".
  • merge → vertical concat — no _term alignment, no global-max padding.
  • sum(dim) → drop the dim from the key — no restack.

Attractive because: Polars is already a required dependency
(pyproject.toml:38) and already the IO kernel (to_polars on every object,
the LP writer streams Polars). The seam exists.

Limitations:

  • It's a second data model beside xarray. Two honest sub-options:
    • (B1) Op-local long kernels with dense↔long round-trips at the edges.
      Cheap to land incrementally, but a round-trip only pays off if the op stays
      long end-to-end; isolated conversions can cost more than they save.
    • (B2) Long-format as the primary store — effectively the polar-high
      rewrite. You'd re-implement xarray's label-alignment, broadcasting,
      .sel/.where/.shift, masks, coord bookkeeping, and printing on a
      columnar table. A large fraction of expressions.py / variables.py /
      alignment.py is exactly that machinery.
  • Polars has no n-D labelled array: dims become columns, and broadcasting a
    (snapshot) constant against (snapshot, branch) becomes an explicit join —
    works, but you're reimplementing xarray.
  • Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744 is a real prerequisite for B2: a flat-dim model maps cleanly to a
    table; first-class pd.MultiIndex does not. The Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744 thread is already
    leaning toward dropping MI, which conveniently de-risks an eventual tabular
    core.

Feasibility & recommendation

Don't replace xarray, and don't go COO-under-xarray. Go incremental
long-format on the hot ops (B1), behind the existing dense API
— the "only the
critical operations" instinct. In order of measured peak:

  1. __matmul__/dot against a constant/sparse operand (@/dot against a sparse matrix densifies the result to full _term #748) — most
    self-contained, biggest single win. Detect zeros in other, build the COO
    product, group-sum the contracted dim in long form, return a compact-_term
    expression. No public API change; @/dot against a sparse matrix densifies the result to full _term #748 already has a bit-equivalent reference.
  2. groupby().sum() (groupby-sum pads every group to the largest group size #745) — two separable wins: the transient
    (FBumann/linopy#25 scatter-instead-of-unstack prototype: ~32× faster, fits
    the current dense model as a drop-in) and the inherent padding (needs long
    output). Do the transient first.
  3. merge along a non-_term dim (merge of ragged expressions is the peak allocation in PyPSA model builds #749) — concat in long form,
    rectangularize once to per-result max instead of per-input-then-global.

Each is independently shippable, validated bit-for-bit against the dense path,
and each stays long end-to-end if it feeds the next op or to_polars / the
matrix export (where it would also let you skip the densifying broadcast in
to_polars). The CSRConstraint path (#630) is the proof that a compressed
representation can coexist with the dense one — extend that pattern leftward
into expression building.

Backend A and the full tabular core (B2) are both "one architecture fixes
everything" stories, but they are large, all-or-nothing bets. The incremental
long-format route recovers most of the measured memory op by op, each step
verifiable — and leaves the door open to B2 later (gated on #744 landing as
flat-dim).

Tracking

This issue is that long-format/sparse _term kernel tracker — linking #748 /
#745 / #749 (+ #757) and the incremental plan above. (Earlier siblings pointed
at "#740" for the kernel; #740 is actually the polar-high benchmark.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    data-modelperformanceThis improves performance while not (meaningfully) altering behaviour for users

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions