Umbrella: long-format / sparse `_term` kernel (dense-`_term` memory cluster)

This issue collects information about how to handle sparsity in linopy, while keeping the convenience of xarray broadcasting and backwards compat.

> [!NOTE]
> Generated with AI

## Root cause

A `LinearExpression` stores `coeffs`/`vars` over a **dense `coord_dims × _term` rectangle**. Every op that combines terms re-pads that axis — which is the peak allocation in real PyPSA builds. The same root cause surfaces at four entry points:

- **#748** — `@`/`dot` vs a sparse matrix
- **#745** — `groupby().sum()` pads to the largest group
- **#749** — `merge` of ragged expressions (the build peak)
- **#757** — multi-key `groupby([names])` densifies to a cartesian grid (the fast-path half, #753, shipped in #751)

Context: **#740** (polar-high benchmark — the motivation), **#712/#713** (absent/NaN semantics a sparse store must keep), **#744** (flat dims — prerequisite for a tabular core), **#630** (`CSRConstraint` — existing compressed path, export-side only).

## Direction

Two backends were weighed (full note below): `sparse.COO` under xarray (all-or-nothing; densifies at the first unsupported op) vs. a Polars long-format kernel (the form `to_polars` already emits). **Recommendation: incremental long-format on the hot ops, behind the existing dense xarray API** — convert back to dense `_term` only at the boundary, validate bit-for-bit, ship one op at a time:

1. `__matmul__`/`dot` (#748) — most self-contained, biggest single win
2. `groupby().sum()` (#745) — transient win first (`FBumann/linopy#25`), then the inherent padding
3. `merge` (#749) — concat long, rectangularize once

A full tabular core or `sparse.COO` remain open later, but are larger bets (the core is gated on #744).

*(Earlier siblings referred to "#740" as the kernel — that was a mis-reference to the polar-high benchmark; this issue is the kernel.)*

<details>
<summary><b>Full investigation note</b></summary>

# Sparsity support in linopy — investigation

> Investigation note (Claude Code, prompted by @FBumann), 2026-06-04. Assesses
> two candidate backends for sparse expression storage — sparse xarray
> (`sparse.COO` duck-arrays) vs. a tabular/long-format (Polars) kernel — and
> recommends an incremental path. Code anchors are against the
> `docs/ai-contribution-guide` working tree.

## What the issue tracker already says

A tight, well-profiled cluster of open issues all trace to **one root cause**:
the dense `_term` rectangle in `LinearExpression`.

| Issue | Op | Symptom |
|---|---|---|
| **#748** | `@` / `dot` vs sparse matrix | KVL constraint: 852 `_term` vs 3 needed → **284× cells**, the single largest alloc (132 MB) in a SciGRID-DE build |
| **#745** | `groupby().sum()` | pads every group to the largest group → ~12 GiB on zipf groups; PyPSA works around it by splitting nodal balance into meshed / weakly-meshed buckets |
| **#749** | `merge` of ragged exprs | the actual **build peak** — concat pads each block to global-max `_term` |
| **#740** | (polar-high benchmark) | external Polars-backed competitor, ~2.3–3.5× faster build, ~30% less memory; community + maintainers interested |
| **#712 / #713** | absent vs zero, NaN | the data-model semantics a sparse store would have to preserve |
| **#744** | MultiIndex storage | leaning toward dropping first-class MI — a prerequisite for a tabular core (see below) |

There is already a **partial sparse path**: #630 / `freeze_constraints=True` →
`CSRConstraint` (scipy CSR + label arrays, no dense `Dataset`;
`constraints.py:476`). Measured at ~32% leaner constraint storage on a real
network. But it only engages at the **export/storage** boundary — it does not
touch the build phase where the peak actually lives.

## The actual data model, and where density is forced

A `LinearExpression` is an xarray `Dataset` of three arrays over
`coord_dims × _term`:

- `coeffs` (float, fill `0`), `vars` (int labels, fill **`-1`** = absent),
  `const` (float, no `_term`).

The `_term` axis is a **dense rectangle**. Every operation that combines terms
re-rectangularizes:

- `_sum(dim)` (`expressions.py:1488`) → `stack({_term: [_term]+dim})` — dense product.
- `groupby().sum()` (`expressions.py:262`) → `unstack(group_dim, fill_value=...)` — pads to largest group.
- `merge` (`expressions.py:2476`) → `xr.concat` with `fill_value`, aligning `_term` axes.
- `__matmul__` (`expressions.py:1729`) → literally `(self * other).sum(common_dims)` — never inspects whether `other` is zero.

One subtlety: even **`to_polars` densifies first** — `to_polars`
(`common.py:194`) does `broadcast(ds)[0]` then `.values.reshape(-1)` *before*
`filter_nulls_polars`. So today's "tabular" output is a dense→long dump at the
very end; it does **not** relieve the build-phase peak (consistent with #749
placing the peak at `merge`, upstream of export). The sparse story must be
fixed *in the build ops*, not at export.

---

## Backend A — sparse xarray (`sparse.COO` duck-arrays)

**Idea:** back `coeffs`/`vars` with pydata `sparse.COO` instead of numpy;
zero-coeff terms cost nothing, the public xarray API stays.

**Attractive because:** minimal API change — `LinearExpression` stays an xarray
`Dataset`; in principle every op "just works" and all four issues dissolve at
once (#740's last comment makes exactly this point).

**Riskier than it looks:**

- The ops linopy leans on hardest — `stack`/`unstack` (in `_sum` and
  groupby-sum), `reindex`-with-fill and `concat`-with-fill (in `merge`),
  `groupby` — are precisely the **weakest-supported operations on `sparse.COO`
  under xarray**. `unstack` on COO is notoriously unsupported/slow. You densify
  at the first gap and get *no* win while paying COO overhead everywhere else.
- **Two different fill values.** `coeffs` is sparse-around-`0`; `vars` is
  sparse-around-**`-1`**. COO supports a non-zero `fill_value`, but the two
  arrays must stay co-indexed through every broadcast/align, and xarray
  broadcasting across two COO arrays with differing fill values does not
  compose cleanly.
- Everything numpy-shaped breaks: `.flat`/`.values`, `np.add.at` in
  `matrices.py`, scipy interop, `repr`, `simplify`'s `bincount`. Each needs a
  COO branch.
- It collides with the absent/NaN semantics work (#712/#713) — "absent" is a
  structural zero in COO, the *opposite* of what #712 wants (absent must stay
  distinguishable from a real zero).

**Verdict:** elegant on paper, invasive and fragile in reality. All-or-nothing —
the payoff requires *every* hot op to have a real COO path.

---

## Backend B — tabular / long-format (Polars / COO triplets)

**Idea:** represent the term-carrying part of an expression as a long relation
— `(…coord cols…, coeffs, vars)` — i.e. exactly the COO/tidy form `to_polars`
already emits, and exactly polar-high's architecture (#740).

In this representation the pathological ops become *cheap and natural*:

- `groupby(k).sum()` → `group_by(k).agg` — **no padding**, output is observed groups only.
- `expr @ sparse` → a **join** on the contracted dim over nonzeros only — #748's bit-equivalent prototype is literally "explode `C` to `(branch, cycle, value)` then `groupby(cycle).sum()`".
- `merge` → vertical `concat` — no `_term` alignment, no global-max padding.
- `sum(dim)` → drop the dim from the key — no restack.

**Attractive because:** Polars is **already a required dependency**
(`pyproject.toml:38`) and already the IO kernel (`to_polars` on every object,
the LP writer streams Polars). The seam exists.

**Limitations:**

- It's a **second data model** beside xarray. Two honest sub-options:
  - **(B1) Op-local long kernels** with dense↔long round-trips at the edges.
    Cheap to land incrementally, but a round-trip only pays off if the op stays
    long *end-to-end*; isolated conversions can cost more than they save.
  - **(B2) Long-format as the primary store** — effectively the polar-high
    rewrite. You'd re-implement xarray's label-alignment, broadcasting,
    `.sel`/`.where`/`.shift`, masks, coord bookkeeping, and printing on a
    columnar table. A large fraction of `expressions.py` / `variables.py` /
    `alignment.py` is exactly that machinery.
- Polars has no n-D labelled array: dims become columns, and broadcasting a
  `(snapshot)` constant against `(snapshot, branch)` becomes an explicit join —
  works, but you're reimplementing xarray.
- **#744 is a real prerequisite for B2:** a flat-dim model maps cleanly to a
  table; first-class `pd.MultiIndex` does not. The #744 thread is already
  leaning toward dropping MI, which conveniently de-risks an eventual tabular
  core.

---

## Feasibility & recommendation

**Don't replace xarray, and don't go COO-under-xarray. Go incremental
long-format on the hot ops (B1), behind the existing dense API** — the "only the
critical operations" instinct. In order of measured peak:

1. **`__matmul__`/`dot` against a constant/sparse operand (#748)** — most
   self-contained, biggest single win. Detect zeros in `other`, build the COO
   product, group-sum the contracted dim in long form, return a compact-`_term`
   expression. No public API change; #748 already has a bit-equivalent reference.
2. **`groupby().sum()` (#745)** — two separable wins: the *transient*
   (`FBumann/linopy#25` scatter-instead-of-unstack prototype: ~32× faster, fits
   the current dense model as a drop-in) and the *inherent* padding (needs long
   output). Do the transient first.
3. **`merge` along a non-`_term` dim (#749)** — concat in long form,
   rectangularize **once** to per-result max instead of per-input-then-global.

Each is independently shippable, validated bit-for-bit against the dense path,
and each stays long end-to-end *if* it feeds the next op or `to_polars` / the
matrix export (where it would also let you skip the densifying `broadcast` in
`to_polars`). The `CSRConstraint` path (#630) is the proof that a compressed
representation can coexist with the dense one — extend that pattern leftward
into expression building.

Backend A and the full tabular core (B2) are both "one architecture fixes
everything" stories, but they are large, all-or-nothing bets. The incremental
long-format route recovers most of the measured memory op by op, each step
verifiable — and leaves the door open to B2 later (gated on #744 landing as
flat-dim).

## Tracking

This issue is that long-format/sparse `_term` kernel tracker — linking #748 /
#745 / #749 (+ #757) and the incremental plan above. (Earlier siblings pointed
at "#740" for the kernel; #740 is actually the polar-high benchmark.)

</details>



Issue	Op	Symptom
#748	`@` / `dot` vs sparse matrix	KVL constraint: 852 `_term` vs 3 needed → 284× cells, the single largest alloc (132 MB) in a SciGRID-DE build
#745	`groupby().sum()`	pads every group to the largest group → ~12 GiB on zipf groups; PyPSA works around it by splitting nodal balance into meshed / weakly-meshed buckets
#749	`merge` of ragged exprs	the actual build peak — concat pads each block to global-max `_term`
#740	(polar-high benchmark)	external Polars-backed competitor, ~2.3–3.5× faster build, ~30% less memory; community + maintainers interested
#712 / #713	absent vs zero, NaN	the data-model semantics a sparse store would have to preserve
#744	MultiIndex storage	leaning toward dropping first-class MI — a prerequisite for a tabular core (see below)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Umbrella: long-format / sparse `_term` kernel (dense-`_term` memory cluster) #756

Root cause

Direction

Sparsity support in linopy — investigation

What the issue tracker already says

The actual data model, and where density is forced

Backend A — sparse xarray (`sparse.COO` duck-arrays)

Backend B — tabular / long-format (Polars / COO triplets)

Feasibility & recommendation

Tracking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Umbrella: long-format / sparse _term kernel (dense-_term memory cluster) #756

Description

Root cause

Direction

Sparsity support in linopy — investigation

What the issue tracker already says

The actual data model, and where density is forced

Backend A — sparse xarray (sparse.COO duck-arrays)

Backend B — tabular / long-format (Polars / COO triplets)

Feasibility & recommendation

Tracking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Umbrella: long-format / sparse `_term` kernel (dense-`_term` memory cluster) #756

Backend A — sparse xarray (`sparse.COO` duck-arrays)