Skip to content

Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744

@FBumann

Description

@FBumann

Note

AI-written analysis (Claude Code, prompted by @FBumann). This is a scope question for #717 — the decision window closes when v1 ships, because afterwards any change here is a breaking change with a deprecation cycle.

The question

Every difficult problem in the #732#737#742#717 chain traces back to one root: linopy supports stacked pd.MultiIndex dimensions as a first-class data model. Should v1 keep that — or replace it with a representation that needs none of the machinery?

This is the radical version of the "learn from the xarray community, which struggled a lot with MIs" argument that decided scenario B.

What MultiIndex support costs

Complexity that exists only for MI
_project_onto_multiindex_levels, _LevelProjection, _as_multiindex ~120 lines
MI branches in _coords_to_dict, validate_alignment, _broadcast_to_coords (expand-via-template) ~80 lines
assign_multiindex_safe (~20 call sites), get_dims_with_index_levels, MI serialization in coords_to_dataset_vars ~100 lines
§11's stacked-MI paragraph, the scenario A/B design discussion, pydata/xarray#11368 workarounds weeks
TestMultiIndexProjection + MI tests across 6 files ~400 lines

Plus a permanent tax: every future feature must answer "and what about MultiIndex?".

The alternative: flat dim + auxiliary level coords

The same information, no pd.MultiIndex anywhere — and §11 already governs auxiliary coords:

# instead of:  snapshot = MultiIndex[(2020,t1), (2020,t2), (2030,t1), (2030,t2)]
snapshots = pd.RangeIndex(4, name="snapshot")
period    = xr.DataArray([2020, 2020, 2030, 2030], dims="snapshot", ...)
timestep  = xr.DataArray(["t1", "t2", "t1", "t2"], dims="snapshot", ...)

Verified against the current #717 branch (all snippets run):

x = m.add_variables(coords=[snapshots], name="x")              # ✅ works
expr = (1 * x).assign_coords(period=period)                    # ✅ aux coords attach

# per-period weighting — same explicit recipe as the MI case:
w = xr.DataArray(weights[period.values].values, dims="snapshot", ...)
expr * w                                                       # ✅ works, no projection machinery involved

# groupby a level:
expr.drop_vars("period").groupby(period.rename("inv_period")).sum()   # ⚠️ works, but needs the
                                                                      #    drop/rename dance — naming
                                                                      #    conflict otherwise (fixable)

Sparse indexes (not every combination exists) are naturally representable — that's what a flat list is. The stacked/unstacked round-trip, the projection, the coverage-gap concept: none of them exist in this representation.

Three options

Internal complexity PyPSA impact Tuple .sel() / .unstack()
A. Disallow MI (TypeError → point to flat+aux) deleted n.snapshots API migration gone
B. Convert at the boundary (MI accepted as input, stored flat+aux, re-stacked on output) mostly deleted thin adapter at model build + solution extraction gone on linopy objects
C. Status quo (#717 as-is) stays forever none works

What B would mean concretely

  • coords=[multiindex] still works — linopy decomposes it into flat dim + level aux-coords on entry.
  • The §11 stacked-MI paragraph reduces to one sentence ("MultiIndex coords are stored as a flat dimension with level coords; §11 governs the levels").
  • solution / dual come back flat-indexed with level coords as columns; PyPSA re-stacks for its users (one set_index(levels) call).
  • The groupby naming rough edge needs fixing (small).
  • Cannot reindex onto a stacked MultiIndex via indexers — only reindex_like works pydata/xarray#11368 stops mattering to linopy entirely.

Why this needs deciding now

#717 currently implements C — including the machinery that the legacy-removal checklist says survives 1.0. If the answer is A or B, that machinery (and the spec section, and the tests) should not ship in v1 at all. After v1 ships with C, moving to A/B is a user-facing breaking change.

No position is taken here — the trade-off is real on both sides (PyPSA's n.snapshots API is the crux). But it should be a decision, not a default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions