Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords

> [!NOTE]
> AI-written analysis (Claude Code, prompted by @FBumann). This is a scope question for #717 — the decision window closes when v1 ships, because afterwards any change here is a breaking change with a deprecation cycle.

## The question

Every difficult problem in the #732 → #737 → #742 → #717 chain traces back to one root: **linopy supports stacked `pd.MultiIndex` dimensions as a first-class data model.** Should v1 keep that — or replace it with a representation that needs none of the machinery?

This is the radical version of the "learn from the xarray community, which struggled a lot with MIs" argument that decided scenario B.

## What MultiIndex support costs

| Complexity that exists *only* for MI | |
|---|---|
| `_project_onto_multiindex_levels`, `_LevelProjection`, `_as_multiindex` | ~120 lines |
| MI branches in `_coords_to_dict`, `validate_alignment`, `_broadcast_to_coords` (expand-via-template) | ~80 lines |
| `assign_multiindex_safe` (~20 call sites), `get_dims_with_index_levels`, MI serialization in `coords_to_dataset_vars` | ~100 lines |
| §11's stacked-MI paragraph, the scenario A/B design discussion, pydata/xarray#11368 workarounds | weeks |
| `TestMultiIndexProjection` + MI tests across 6 files | ~400 lines |

Plus a permanent tax: every future feature must answer "and what about MultiIndex?".

## The alternative: flat dim + auxiliary level coords

The same information, no `pd.MultiIndex` anywhere — and **§11 already governs auxiliary coords**:

```python
# instead of:  snapshot = MultiIndex[(2020,t1), (2020,t2), (2030,t1), (2030,t2)]
snapshots = pd.RangeIndex(4, name="snapshot")
period    = xr.DataArray([2020, 2020, 2030, 2030], dims="snapshot", ...)
timestep  = xr.DataArray(["t1", "t2", "t1", "t2"], dims="snapshot", ...)
```

**Verified against the current #717 branch** (all snippets run):

```python
x = m.add_variables(coords=[snapshots], name="x")              # ✅ works
expr = (1 * x).assign_coords(period=period)                    # ✅ aux coords attach

# per-period weighting — same explicit recipe as the MI case:
w = xr.DataArray(weights[period.values].values, dims="snapshot", ...)
expr * w                                                       # ✅ works, no projection machinery involved

# groupby a level:
expr.drop_vars("period").groupby(period.rename("inv_period")).sum()   # ⚠️ works, but needs the
                                                                      #    drop/rename dance — naming
                                                                      #    conflict otherwise (fixable)
```

Sparse indexes (not every combination exists) are *naturally* representable — that's what a flat list is. The stacked/unstacked round-trip, the projection, the coverage-gap concept: none of them exist in this representation.

## Three options

| | Internal complexity | PyPSA impact | Tuple `.sel()` / `.unstack()` |
|---|---|---|---|
| **A. Disallow MI** (`TypeError` → point to flat+aux) | deleted | `n.snapshots` API migration | gone |
| **B. Convert at the boundary** (MI accepted as *input*, stored flat+aux, re-stacked on output) | mostly deleted | thin adapter at model build + solution extraction | gone on linopy objects |
| **C. Status quo** (#717 as-is) | stays forever | none | works |

## What B would mean concretely

- `coords=[multiindex]` still works — linopy decomposes it into flat dim + level aux-coords on entry.
- The §11 stacked-MI paragraph reduces to one sentence ("MultiIndex coords are stored as a flat dimension with level coords; §11 governs the levels").
- `solution` / `dual` come back flat-indexed with level coords as columns; PyPSA re-stacks for its users (one `set_index(levels)` call).
- The groupby naming rough edge needs fixing (small).
- pydata/xarray#11368 stops mattering to linopy entirely.

## Why this needs deciding now

#717 currently implements **C** — including the machinery that the legacy-removal checklist says survives 1.0. If the answer is A or B, that machinery (and the spec section, and the tests) should not ship in v1 at all. After v1 ships with C, moving to A/B is a user-facing breaking change.

No position is taken here — the trade-off is real on both sides (PyPSA's `n.snapshots` API is the crux). But it should be a decision, not a default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744

The question

What MultiIndex support costs

The alternative: flat dim + auxiliary level coords

Three options

What B would mean concretely

Why this needs deciding now

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Complexity that exists only for MI
`_project_onto_multiindex_levels`, `_LevelProjection`, `_as_multiindex`	~120 lines
MI branches in `_coords_to_dict`, `validate_alignment`, `_broadcast_to_coords` (expand-via-template)	~80 lines
`assign_multiindex_safe` (~20 call sites), `get_dims_with_index_levels`, MI serialization in `coords_to_dataset_vars`	~100 lines
§11's stacked-MI paragraph, the scenario A/B design discussion, pydata/xarray#11368 workarounds	weeks
`TestMultiIndexProjection` + MI tests across 6 files	~400 lines

	Internal complexity	PyPSA impact	Tuple `.sel()` / `.unstack()`
A. Disallow MI (`TypeError` → point to flat+aux)	deleted	`n.snapshots` API migration	gone
B. Convert at the boundary (MI accepted as input, stored flat+aux, re-stacked on output)	mostly deleted	thin adapter at model build + solution extraction	gone on linopy objects
C. Status quo (#717 as-is)	stays forever	none	works

Decision before v1: first-class MultiIndex support vs. flat dims + auxiliary level coords #744

Description

The question

What MultiIndex support costs

The alternative: flat dim + auxiliary level coords

Three options

What B would mean concretely

Why this needs deciding now

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions