Skip to content

refactor(common): split DataArray conversion into a 3-rung strictness ladder#737

Merged
FabianHofmann merged 10 commits into
fix/bounds-coords-broadcastfrom
refactor/dataarray-strictness-ladder
Jun 2, 2026
Merged

refactor(common): split DataArray conversion into a 3-rung strictness ladder#737
FabianHofmann merged 10 commits into
fix/bounds-coords-broadcastfrom
refactor/dataarray-strictness-ladder

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented Jun 1, 2026

Stacked on #732. Resolves the open questions from the #732 review thread by re-layering linopy.common's DataArray conversion. The design was developed in this PR's comment thread.

What

Two public entry points — replacing the as_dataarray-with-flag + private _as_dataarray_lax + align_to_coords trio:

as_dataarray(arr, coords, dims)                              convert: type dispatch + positional labeling
broadcast_to_coords(arr, coords, dims, *,                    + broadcast: project MI levels, reorder,
                    strict=True, label=...)                    expand missing dims, transpose to coords order

strict decides what happens to anything broadcasting alone cannot resolve (extra dims, disagreeing coord values, MI coverage gaps):

  • strict=True (default): raise, naming label in the error (label is required in this mode — enforced via overloads and at runtime). Forgetting the flag adds safety instead of silently dropping validation.
  • strict=False: pass through for downstream xarray alignment (the operator's join= owns reconciliation).

Backed by one private mechanics function (_broadcast_to_coords) that reports MultiIndex projections (_LevelProjection) instead of deciding what they mean; the public function applies policy per mode.

Caller profiles

Caller Call
__matmul__ (2 sites) — broadcasting would contract a=(time,name) @ b=(name,location) to (location) instead of (time,location) as_dataarray(other, coords=…, dims=…)
add_variables / add_constraints bounds + mask broadcast_to_coords(lower, coords, label="lower bound")
Expression arithmetic, to_linexpr, as_expression (constraint lhs/rhs setters) broadcast_to_coords(other, coords=…, strict=False)

MultiIndex policy (scenario B — decided in this thread)

Terminology: a stacked MultiIndex dim has levels (its component index names, e.g. period / timestep) and level combinations (its elements — one tuple per position, e.g. (2030, 't1')).

Implicit level projections are deprecated everywhere and will raise under the v1 convention — the MI check is the same in both modes:

MI situation strict=False (arithmetic) strict=True (bounds/mask)
input misses a whole level (per-period bounds) EvolvingAPIWarning EvolvingAPIWarning
coverage gap (some level combinations get no value) EvolvingAPIWarning ValueError — the error lists the missing combinations

The warning channel carries a TODO(#738): migrate EvolvingAPIWarningLinopySemanticsWarning once #717 lands.

This removes the enforce_level_coverage flag and the cross-module use of a private helper — the two things the #732 thread flagged.

Behavior changes

One, deliberate (scenario B): add_variables / add_constraints with inputs indexed by a subset of a MultiIndex's levels (e.g. PyPSA's per-period bounds) now emit the EvolvingAPIWarning deprecation warning. Everything else keeps the semantics it has on #732. (Also caught in review and fixed: the constraint lhs/rhs setters go through as_expression, which now uses strict=False — with regression tests.)

Tests

  • Strictness-contrast tests: as_dataarray doesn't expand / broadcast_to_coords(strict=False) passes mismatches through / strict=True rejects them with labeled errors, stays silent on partial-level bounds.
  • test_matmul_contracts_only_shared_dims: pins (dim_0,dim_1) @ (dim_1,location) → (dim_0,location).
  • Constraint rhs setter regressions: missing-dim broadcast + MultiIndex-level projection.
  • Full suite: 3200 passed, 36 skipped. mypy + pre-commit clean.

Out of scope (agreed in the thread)

🤖 Generated with Claude Code

FBumann and others added 2 commits June 1, 2026 14:47
… ladder

Replace the as_dataarray + _as_dataarray_lax pair (and the
enforce_level_coverage flag) with three public entry points, each
including the previous one:

- as_dataarray: convert only (the former _as_dataarray_lax). Used by
  __matmul__, where dims missing from the constant must not be
  broadcast in (they would be contracted away as common dims).
- broadcast_to_coords: convert + broadcast against coords (the former
  broadcasting as_dataarray). Used by expression arithmetic.
- align_to_coords: convert + broadcast + enforce the coords contract.
  Used by add_variables / add_constraints (unchanged signature).

The broadcasting mechanics live in one shared private core
(_broadcast_core) that reports MultiIndex-level projections instead of
applying policy. The entry points decide what a partial projection or
coverage gap means: broadcast_to_coords warns (arithmetic convention),
align_to_coords raises (coords contract). This removes the
enforce_level_coverage flag and keeps validation concerns out of the
broadcasting layer.

No behavior changes; all call sites keep their semantics. New tests pin
the ladder contrasts and the matmul dim-contraction rules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 1, 2026

@FabianHofmann For me the whole broadcasting and alignment in linopy is at least as difficult to undnerstand as the arithmetics. And arithmetics also rely on this broacasting. SO i think getting this right, with clear methods that have distinct, understandable roles is a really important step.

Thats why im putting this much effort into it.

And thanks for your work aout Multiindex. Im not really that good in that area

@FBumann FBumann requested a review from FabianHofmann June 1, 2026 13:32
FBumann and others added 2 commits June 1, 2026 15:56
Private-twin convention: _broadcast_to_coords is the raw implementation
of broadcast_to_coords (returns projection events instead of applying
policy), shared with align_to_coords.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… rung

The constraint lhs/rhs setters call as_expression(value, model,
coords=self.coords, dims=self.coord_dims); forwarding those kwargs to
the convert-only as_dataarray dropped the broadcasting these setters
relied on (e.g. a MultiIndex-level-indexed rhs failed with an xarray
AlignmentError instead of being projected onto the stacked dim).

Use broadcast_to_coords instead. The other as_expression callers pass
only dims (no coords), for which both rungs behave identically.

Adds regression tests for the rhs setter: missing-dim broadcast and
MultiIndex-level projection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FabianHofmann
Copy link
Copy Markdown
Collaborator

@FBumann thanks for taking another look, I read the code and the tests. it all makes sense. it is a complicated thing and the time spent here is definitely worth it. I am wondering whether we are already at the optimum.

still thinking about the arithmetics the generalization we have there is the join parameter. I wonder if we should/could have an equivalent for the coords alignment even though this is a asymetric case where the reference (coords) are immutable.

I would see that the final API could be

align_to_coords(arr, coords, join="exact")

equivalent to

align(expr, expr2, join="exact") # what we already have

most of the calls of as_dataarray + reindex/broadcase calls that we have could be replaced by one align_to_coords call. we could also make that a follow up, the only thing would be collapsing the broadcast/align function again. here is detailed plan from claude we we could add this here:

Details

Convergence target

as_dataarray(arr, coords, dims)                          # convert only — matmul
align_to_coords(arr, coords, *, join="defer",            # convert + broadcast + reconcile [+ validate]
                enforce_dims=False, fill_value=NA,
                dims=None, label=None)

Two public functions instead of three. broadcast_to_coords stops being a public name (or stays as a one-line deprecated alias for align_to_coords(join="defer")).

What to change in #737

1. Don't ship broadcast_to_coords as a public function. It's the thing we'd deprecate next PR. Instead, rename the private mechanics _broadcast_to_coords_align_to_coords, and make align_to_coords the single public reconcile entry point with a join parameter. The three #737 presets become arguments:

#737 public name becomes
broadcast_to_coords(arr, coords) align_to_coords(arr, coords, join="defer")
align_to_coords(arr, coords, label=…) align_to_coords(arr, coords, join="exact", enforce_dims="raise", label=…)
(new) align_to_coords(arr, coords, join="left", fill_value=…)

2. Put join in the one reconcile block — nothing else moves. The only mechanic that varies is the per-dim loop #737 left at common.py ~425–435 ("Same values, different order → reindex; different value sets are left alone"). Make that block switch on join:

for dim, coord_values in expected.items():
    ...
    if actual_idx.equals(expected_idx):
        continue
    same_set = len(actual_idx) == len(expected_idx) and set(actual_idx) == set(expected_idx)
    if join == "defer":
        if same_set:                      # today's behavior verbatim
            arr = arr.reindex({dim: expected_idx})
    elif join == "left":
        arr = arr.reindex({dim: expected_idx}, fill_value=fill_value)   # NEW
    elif join == "override":
        if len(actual_idx) == len(expected_idx):
            arr = arr.assign_coords({dim: expected_idx})
    # join == "exact": leave values; the contract check raises (see #3)

join="defer" reproduces #737 byte-for-byte, so the arithmetic path is unchanged. left is the new capability; exact defers the raising to validation.

3. Make the policy a parameter keyed off enforce_dims (tri-state), folding in the _LevelProjection report. This is where the considerations from earlier land. One policy block in align_to_coords replaces the two preset blocks:

enforce_dims: Literal[False, "warn", "raise"] = False
concern enforce_dims=False (arithmetic) "warn" (to_constraint rhs) "raise" (bounds/mask)
extra dims (arr.dims ⊄ coords) ignore warn raise (validate_alignment)
MI partial-level projection warn warn silent (documented bounds feature)
MI coverage gap warn warn raise
shared-dim value mismatch governed by join join join="exact" ⇒ raise

The "warn" rung is the must-have: to_constraint rhs (expressions.py:1109–1116) warns and proceeds on extra dims — a bare enforce_dims=True would wrongly raise there. The tri-state is what lets that warning move into the function instead of being hand-rolled at the call site.

4. Exclude HELPER_DIMS in validate_alignment's extra-dims check. #737's validate_alignment flags any dim not in coords. The constraint coeffs/vars setters carry _term, so without exclusion enforce_dims="raise" would reject them later. Mirror what align() already does (exclude=frozenset(...).union(HELPER_DIMS), common.py:1840). Zero behavior change today (bounds never carry _term); it just unblocks the setter migration.

5. Migrate call sites in #737 (atomically).

  • Arithmetic (_add_constant :586, _apply_constant_op :612, as_expression): broadcast_to_coordsalign_to_coords(..., join="defer"). These still feed _align_constant afterwards — leave that untouched (it owns the symmetric inner/outer join, which stays out of scope).
  • Strict sites must change in the same commit (model.py:777/778/791/1060): today's bare align_to_coords(value, coords, label=…) must become align_to_coords(..., join="exact", enforce_dims="raise", label=…). If the new default is join="defer", leaving them bare would silently stop validating — that's the one correctness trap, so it has to be atomic.
  • matmul: stays on as_dataarray (convert-only). The whole point — join can't express its no-broadcast need, and refactor(common): split DataArray conversion into a 3-rung strictness ladder #737 already got this right.

6. Demonstrate join="left" by collapsing two real dances now (behavior-preserving, since as_dataarray + reindex_like(fill) ≡ align_to_coords(join="left", fill_value=fill)):

  • variables.py:330-332 to_linexpr coeff: 3 lines → align_to_coords(coeff, self.coords, dims=self.dims, join="left", fill_value=0).
  • expressions.py:1107-1116 to_constraint rhs: → align_to_coords(rhs, self.coords, join="left", fill_value=NA, enforce_dims="warn").

This makes the join param used, not speculative — proving the design against the exact callers that motivated it.

What stays out of scope (and why — confirmed by the exploration)

  • merge / _align_constant's explicit joinsymmetric inner/outer, mutate the reference. Untouched.
  • The per-field fill_value dict in Constraints.reindex — Dataset-level, not single-array. Untouched.
  • The bare-DataArray(value).broadcast_like(...) setters (constraints.py, variables.py bounds) — candidates for a later join="override" migration, not refactor(common): split DataArray conversion into a 3-rung strictness ladder #737 (moving them to exact would tighten behavior).

Net effect on #737

Same diff size, same zero-behavior-change guarantee, same matmul/flag fixes — but the public surface lands as as_dataarray + align_to_coords(join=...) directly, so there's no introduce-then-deprecate of broadcast_to_coords, and the join="left" capability ships tested and in use. The PR's narrative shifts from "3-rung ladder" to "convert primitive + one reconcile function parameterized by join."

Two decisions for you

  1. Drop broadcast_to_coords entirely, or keep it as a thin deprecated alias for readability at arithmetic sites (broadcast_to_coords(x, c) reads better than align_to_coords(x, c, join="defer") — and "align … join=defer" is mildly self-contradictory)?
  2. join="left" migration in refactor(common): split DataArray conversion into a 3-rung strictness ladder #737, or as the immediate follow-up? Doing it in refactor(common): split DataArray conversion into a 3-rung strictness ladder #737 proves the param; deferring keeps refactor(common): split DataArray conversion into a 3-rung strictness ladder #737 purely structural.

Want me to check out refactor/dataarray-strictness-ladder and implement this?

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 1, 2026

@FabianHofmann I think the important thing is to get the mental model right and to know what is done where in the codebase. This will be achieved with this PR i think. After that, refactoring to the proposed design seems much simpler.

However, Im not sure of going from 3 back to 2 methods is desireable. Its 3 very distinct methods. A join parameter in general is a good idea I think, but if we add an ambigous option like "defer", which is not unknown for a join, this creates new ambiguity i think.

My proposal would be: Lets merge this PR, then look if we need a refactoring back to 2 mehods and a join parameter after all.

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 1, 2026

@FabianHofmann I thought about the join parameter. I think it's a nice addition — but not as align_to_coords(join=...).

The pattern: _like_to_coords

xarray's _like methods conform one already-built object to another object. The _to_coords family conforms raw user input (scalars, numpy, pandas, DataArrays) to explicitly passed coords — the thing add_variables / add_constraints / arithmetic must do before any xarray method can even apply, since the input isn't a DataArray yet and coords (a dict / Coordinates / list of indexes) isn't an alignable object.

Same operations, different reference type — and linopy already has most of the left column:

operation reference = another object reference = explicit coords
convert as_dataarray(arr, coords, dims)
broadcast — make dims agree Variable.broadcast_like(other) broadcast_to_coords(arr, coords)
reindex — make entries agree, fill gaps reindex_like(other) on expressions / constraints ✓ reindex_to_coords(arr, coords, fill_value=…)NEW (your join="left")
exact check — entries must agree, else raise align(a, b, join="exact") strict_broadcast_to_coords(arr, coords, label=…) (today's align_to_coords)
inner / outer join — both sides change align(a, b, join=…) ✓, operator join= impossible

Three things the matrix shows:

1. The family completes what linopy already half-has. Each _to_coords function is the coords-flavored sibling of an existing method. Nothing exotic.

2. Why there's no join= parameter in the right column. A join that changes both sides (inner / outer) cannot be completed there: coords is frozen, and the expression it came from isn't in the function's hands — it can't be grown to match. The only place holding both sides is the operator (.add / .mul / .le (..., join=)_align_constant), which is where inner / outer already live. What remains expressible against a frozen reference is exactly: check (exact) or conform (left) — two functions, not a parameter.

3. A purity note. xr.broadcast / broadcast_like quietly outer-align conflicting shared-dim entries before broadcasting (they can — they hold both objects). broadcast_to_coords deliberately doesn't: a half-completed join is worse than none, so entry conflicts pass through untouched to the operator's join=. This matches v1 §9, where broadcasting is dims-only and entry conflicts are §8's business.

The rename: align_to_coordsstrict_broadcast_to_coords

The strict rung never aligns anything — it checks and raises. "Align" is also exactly the word that invited the join= idea: aligns take joins, broadcasts don't. Naming it as what it is — the same broadcast with a strict failure mode — makes the no-join design self-enforcing, and puts the required label argument on the one function that needs it.

Follow ups

The join="left" capability — follow-up PR as reindex_to_coords, migrating the two broadcast + reindex_like call sites (to_linexpr coefficient, to_constraint rhs) so it ships with real users. #737 stays purely structural.


Refined with Claude Code.

The function never aligns anything — it broadcasts and raises on any
mismatch it cannot resolve by broadcasting alone. "Align" is also the
word that invites join= proposals (aligns take joins, broadcasts do
not), so the name now states what it is: the same broadcast as
broadcast_to_coords with a strict failure mode (zip(strict=True)
semantics).

Error messages keep the "could not be aligned to coords" wording so
tests in the base branch (#732) stay untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 1, 2026

I prototyped reindex_to_coords and I think we should NOT include it.

Reason 1:
Only 2 callers could need it, and they both can use arr.reindex_like() directly

Reason2:
arr.reindex() with multi-indexes is not properly supported by xarray.
Reported upstream with a reproducible example: pydata/xarray#11368

FBumann and others added 2 commits June 1, 2026 21:29
- Document the one non-obvious policy in strict_broadcast_to_coords:
  partial-level broadcasts are silent (bounds-broadcast feature), unlike
  the warning on the broadcast rung.
- Unify the first parameter name across the ladder (value -> arr).
- Un-invert the warning-policy loop in broadcast_to_coords.
- Rename the test whose name forced an awkward signature wrap to a
  behavior-oriented name (test_extra_dims_pass_broadcast_rung_fail_strict_rung).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parameter entries carry descriptions only — types live in the function
signatures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FabianHofmann
Copy link
Copy Markdown
Collaborator

sounds good. we are moving in the right direction.

strict_broadcast and broadcast are differing in the MI checks. I would argue the latter should go to both. Second, and that is on me, the partial level coverage should be supported in future (I have to check compliance with v1 conventions again quickly).

that said, I still think the two should live in one function broadcast_to_coords with a flag strict: bool. they only differ in strictness checks and I find strict_broadcast_to_coords a bit too verbose as a function name.

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

sounds good. we are moving in the right direction.

strict_broadcast and broadcast are differing in the MI checks. I would argue the latter should go to both. Second, and that is on me, the partial level coverage should be supported in future (I have to check compliance with v1 conventions again quickly).

that said, I still think the two should live in one function broadcast_to_coords with a flag strict: bool. they only differ in strictness checks and I find strict_broadcast_to_coords a bit too verbose as a function name.

I would leave the MI stuff up to you. I never use MI and therefore dont know whats needed. If it can be the same/extracted, thats a win i think!

And if the MI stuff is equal, both methods could be one. I'd suggest the signature:

broadcast_to_coords(arr, coords=None, dims=None, *, strict=True, label=None, **kwargs)
  • bounds/mask: broadcast_to_coords(lower, coords, label="lower bound")
  • arithmetic: broadcast_to_coords(other, coords=…, strict=False) — explicit opt-out
  • Forgetting the flag adds safety instead of silently dropping validation.

That said, we can only unify the methods if we actually unify what strict means:

v1 says Then
v1 governs add_variables inputs too both modes warn on partial level → per-period bounds are deprecated usage
partial-level stays supported both modes silent on partial level → the #732 arithmetic warning gets removed

So I'd wait for your v1 check — once strict means one thing, the merge is mechanical.

EDIT

@FabianHofmann
Copy link
Copy Markdown
Collaborator

great, let's go

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

@FabianHofmann You where to fast. I edited it a bit. Did you check v1?

@FabianHofmann
Copy link
Copy Markdown
Collaborator

FabianHofmann commented Jun 2, 2026

@FBumann let me reconcile my thoughts on the partial coverage. you do the signature change, use MI handling of today's strict version and I make some research again how MI is supported in upcoming xarray versions (which could change the picture). so don't mind the MI handling (perhaps don't deleted the checks as well), merge this one as soon as you feel ready and I take another look at MI in #732

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

Note

Claude Code summary — structuring the open MI question so it's a one-table decision.

The strict flag and the v1 question are orthogonal:

  • strict governs raise vs. tolerate (extra dims, value mismatches, coverage gaps) — same in both scenarios below.
  • Your v1 check governs whether partial-level projection is a feature or deprecated — and that answer applies to both modes equally.

So the unified broadcast_to_coords(strict=...) is implementable either way; your check picks which table:

A — v1 allows partial-level projection (stays a feature)

MI situation strict=False (arithmetic) strict=True (bounds/mask)
partial level, full coverage silent silent
coverage gap warn (implicit absence creation, §4) raise

→ the #732 partial-level warning is removed; #717 needs an amendment legitimizing the projection (§9 extended to MI levels).

B — v1 forbids implicit projection (#717 as written, §8/§11)

MI situation strict=False strict=True
partial level warn (deprecated) warn (deprecated — also for bounds)
coverage gap warn raise

→ per-period bounds (PyPSA multi-investment) become deprecated usage with a migration path before v1; #717 stays as written.

Bottom line: pick A or B; the merge into one function is mechanical after that. A = amend the convention. B = deprecate what #732 just shipped.


@FabianHofmann this really helped me

@FabianHofmann
Copy link
Copy Markdown
Collaborator

we should learn from xarry community which struggled a lot with MI's it seems - let's pick B!

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

Note

Claude Code reference note — the MultiIndex representation that underlies every MI issue in this stack. Useful background for the #732 MI follow-up.

Click to expand

A pandas MultiIndex has two xarray representations, and the friction between them is the root of all the MI complexity here:

Stacked: 1 dim + level coords (what linopy / PyPSA use)

<xarray.DataArray (snapshot: 4)>
Coordinates:
  * snapshot  (snapshot) MultiIndex          ← THE dim, holds tuples (2020,'t1')…
  * period    (snapshot) 2020 2020 2030 2030 ← level coord — NOT a dim
  * timestep  (snapshot) 't1' 't2' 't1' 't2' ← level coord — NOT a dim

One dimension; each level is a non-dimension (auxiliary) coordinate attached to it. Can represent sparse indexes (only the combinations that exist).

Unstacked: 2 dims (what .unstack() / Dataset.from_dataframe() produce)

<xarray.DataArray (period: 2, timestep: 2)>
array([[1., 2.],
       [3., 4.]])

Real dimensions, always the full cartesian product — sparse indexes become NaN holes.

Why every MI issue in this stack is this distinction

Issue Cause
Partial-level broadcast (per-period bounds) input has period as a dim; target has period as a level coord. Same name, different role → the projection translates between the two representations
v1 §11 applies (aux-coord conflict) level coords are auxiliary coords — a level-indexed operand is exactly the §11 case
pydata/xarray#11368 (reindex fails) reindexing the stacked dim must also rewrite the level coords; raw indexers don't know they exist
expand_dims workaround in #732 expanding a missing MI dim must create dim + all level coords, not just the dim
Coverage gaps only exist when conforming a sparse stacked index to a full one — the unstacked form can't even express sparsity except as NaN

So for the #732 MI follow-up: the question "how is MI supported in upcoming xarray versions" is concretely "does xarray's indexer API learn to handle level coords" (#11368) — the stacked form itself is stable; it's every operation that crosses between the two forms that needs hand-holding.

@FabianHofmann
Copy link
Copy Markdown
Collaborator

FabianHofmann commented Jun 2, 2026

Note

Claude Code reference note — the MultiIndex representation that underlies every MI issue in this stack. Useful background for the #732 MI follow-up.

Click to expand

yes, that is all correct but no blocker right? atm we support partial level coverage ie. allowing levels as indexes, but this will change in future (warn now, raise later)

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

Note
Claude Code reference note — the MultiIndex representation that underlies every MI issue in this stack. Useful background for the #732 MI follow-up.
Click to expand

yes, that is all correct but no blocker right? atm we support partial level coverage ie. allowing levels as indexes, but this will change in future (warn now, raise later)

Yes, i just wanted to dump some context into this PR. Not blocking

FBumann and others added 2 commits June 2, 2026 11:00
…strict=...)

Per review discussion: one public function instead of two, with strict as
a keyword flag.

- strict=True (default): any mismatch with coords raises, naming label in
  the error — the former strict_broadcast_to_coords.
- strict=False: mismatches pass through for downstream xarray alignment —
  the former loose broadcast_to_coords, used by arithmetic.

Strict is the default so that forgetting the flag adds safety rather than
silently dropping validation. MI handling preserved exactly per mode
(strict: silent partial / raise on gap; non-strict: EvolvingAPIWarning) —
the scenario-B deprecation warnings land separately in #732.

Call sites: model.py bounds/mask drop the long name (strict is default);
arithmetic and as_expression pass strict=False explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restores the contract align_to_coords always had: strict-mode errors must
name their subject ("lower bound could not be aligned..." rather than
"Value could not be aligned..."). Enforced both statically (overloads:
strict=True requires label: str, strict=False forbids it) and at runtime
(TypeError).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ario B)

Per the #737 review discussion and Fabian's decision: implicit level
projection is deprecated and will raise under the v1 convention, so the
EvolvingAPIWarning now fires in both modes of broadcast_to_coords — the
MI check is the same for every use case:

- input missing a whole level: warn (strict and non-strict)
- coverage gap (level combinations without a value): warn (non-strict) /
  raise (strict — no downstream layer to defer the NaN to)

Warning emission lives in one helper, _warn_implicit_projections, with a
TODO(#738) to migrate to LinopySemanticsWarning once #717 lands.

Also clarifies the MultiIndex terminology everywhere: an MI dim has
*levels* and *level combinations* (one tuple per position). Docstrings
carry the glossary, the coverage-gap error names the missing
combinations explicitly, and "entry" is gone from messages.

User-facing: add_variables / add_constraints with per-period-style
bounds now emit the deprecation warning (PyPSA multi-investment).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented Jun 2, 2026

@FabianHofmann Ready

@FabianHofmann FabianHofmann merged commit ff9644a into fix/bounds-coords-broadcast Jun 2, 2026
3 checks passed
@FabianHofmann FabianHofmann deleted the refactor/dataarray-strictness-ladder branch June 2, 2026 10:14
FabianHofmann added a commit that referenced this pull request Jun 2, 2026
…ints (#732)

* fix(variables): broadcast and order pandas/DataArray bounds in coords

`add_variables` had two related bugs when `lower`/`upper` were arrays:

- pandas Series/DataFrame bounds missing a dimension in `coords` had
  the missing dimension silently dropped (#709), unlike DataArray
  bounds which were already broadcast.
- DataArray bounds missing a dimension were expanded with
  `DataArray.expand_dims`, which prepends new dimensions and produces
  a `coords`-mismatched dimension order in the resulting variable
  (#706). The order depended on the type of the bounds, so scalar
  bounds worked but two array bounds missing the same dimension did
  not.

Replace `_validate_dataarray_bounds` plus the downstream
`as_dataarray(..., coords)` call with a single helper
`_as_dataarray_in_coords`. It converts any input (pandas with named
axes via `to_xarray`, otherwise via `as_dataarray`), validates the
result against `coords`, expands missing dims, transposes to coords
order, and reconstructs the coord variables in that order.
`expand_dims` and `transpose` are no-ops when the array already
matches, so scalar / full-dim DataArray bounds keep their fast path.

Also fix `linopy.piecewise._broadcast_points`, which built the
`expand_dims` map from a `set`, producing a hash-randomized dimension
order across processes. Iterate expressions and dims in declaration
order instead.

Closes #706 and #709. Supersedes #710 and #719.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(variables): frame add_variables coords as source of truth

Restate #706/#709's fix as a single principle in the docstring,
release note, and `_as_dataarray_in_coords` helper docstring:
when `coords` is provided to `add_variables`, it is the source of
truth for dimensions, dimension order, and coordinate values, and
`lower` / `upper` are broadcast and aligned to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: frame bounds fix as extending 0.7.0's coords-as-truth fix

0.7.0 already shipped "add_variables no longer ignores coords when
lower / upper are DataArrays". Recast the new bullet as extending
that fix to the remaining gaps (pandas bounds; dim order across
bound types) so the continuity is visible from the release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: reword as "extend and finalize", emphasize hardening

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: rephrase as "0.7.0 made ... this release closes the two remaining gaps"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: spell out dims/order/values in coords-as-truth bullet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(variables): cover pandas MultiIndex bounds and dim reindex

- Parametrize test_bound_broadcast_missing_dim with three additional
  cases: Series with MultiIndex(time, colour), DataFrame with
  MultiIndex columns(space, colour), and DataFrame with MultiIndex
  index(time, space). Exercises the `while DataFrame: unstack()`
  loop and the MultiIndex branch of `_named_pandas_to_dataarray`.
- Add test_dataarray_coord_reorder for the same-values-different-order
  reindex branch (previously only the unequal-values raise was
  covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: move as_dataarray_in_coords to common.py

Relocate `_as_dataarray_in_coords` and its helpers
(`_coords_to_dict`, `_named_pandas_to_dataarray`) from `model.py`
into `common.py`, alongside the existing `as_dataarray` they
parallel. Rename to `as_dataarray_in_coords` (no leading underscore)
since it is no longer file-local — other modules can import the
strict-coords variant when migrating call sites.

Pure relocation: no behavior change, no call-site changes beyond
`add_variables`'s import. Refs #723.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(common): simplify _named_pandas_to_dataarray + cover edge branches

Replace the unstack-while-loop / split named-check structure with a
single up-front "all axes named" check and a single
``DataFrame.stack(level=list(range(nlevels)), future_stack=True)``
call that collapses all column levels into the row MultiIndex in
one shot. Same observable behaviour, fewer moving parts, no
defensive unreachable branches.

Add tests covering the unnamed-axis fall-through path, the
empty-coords short-circuit in ``as_dataarray_in_coords``, and the
``MultiIndex``-on-a-dim ``continue`` in the validation loop.
Together with the restructure these bring the new helper code to
full patch coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(common): only accept string axis names in _named_pandas_to_dataarray

Pandas allows any hashable in ``pd.Index.names`` (tuples, ints,
etc.), but only strings map cleanly to xarray dim names. Reject
anything non-string up front so the pandas falls back to
``as_dataarray`` instead of producing a DataArray with an awkward
non-string dim name that downstream validation would reject with a
confusing "extra dimensions" error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(common): align positional inputs to coords, with clear shape errors

Inputs without their own meaningful labels — numpy arrays, polars
Series, pandas with unnamed axes — fell through ``as_dataarray_in_coords``
via a short-circuit return. That meant:

- The default ``dim_0`` / ``dim_1`` axis names from ``as_dataarray``
  leaked into the result, so a pandas Series without an index name
  combined with another bound carrying a named coord produced a
  spurious 2-D variable.
- Shape mismatches surfaced further downstream as confusing
  "coordinates do not match" errors against the auto-generated
  ``RangeIndex``.

The fall-through now: (a) defaults ``dims`` to coords' keys so axes
get labelled correctly; (b) runs the same validate / expand /
transpose path as labelled inputs; (c) re-assigns coords from
``expected`` on the resulting DataArray so positional inputs align
to coords by position. A shape mismatch surfaces as xarray's clear
``conflicting sizes`` from ``assign_coords``. MultiIndex coords are
left alone (re-assigning a PandasMultiIndex emits a FutureWarning).

Replaces the tautological ``test_pandas_bound_with_unnamed_axis_falls_through``
(which sneaked past by naming the coord ``"dim_0"`` to match the
auto-generated dim) with ``test_positional_bound_aligns_to_coords``
that asserts actual positional alignment across numpy / Series /
DataFrame, plus ``test_positional_bound_wrong_size_raises_clear_error``
for the shape-mismatch path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sos): use var.indexes[d] for reformulated bounds; widen _coords_to_dict

``reformulate_sos1`` / ``reformulate_sos2`` built the coords for
the indicator variable as ``[var.coords[d] for d in var.dims]``,
which is a list of ``xarray.DataArray`` coord objects. The rest
of linopy passes ``coords`` as a list of ``pd.Index``. The mix
slipped through under the old short-circuit fall-through but
broke once the helper started defaulting ``dims`` from
``_coords_to_dict(coords)`` — non-``pd.Index`` entries were
silently dropped, so ``len(dims) < len(coords)`` and xarray
raised ``different number of dimensions on data and dims: 2 vs 1``.

Use ``var.indexes[d]`` instead — it returns the actual
``pd.Index`` (regular or MultiIndex) for the dim and preserves
structure that ``pd.Index(coord.values, ...)`` would flatten.

Also widen ``_coords_to_dict`` to accept any entry with a
``.name`` (xarray DataArrays included) so a future caller passing
mixed types doesn't silently lose coords. The reformulator fix
removes the only known producer of mixed-type coords; this is
belt-and-suspenders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(common): tighten _coords_to_dict to raise on non-pd.Index entries

Replace the permissive ``getattr(c, "name", None)`` check with an
explicit allow-list: ``pd.Index`` (named or not — unnamed silently
skip as before) and unnamed sequences (``list`` / ``tuple`` /
``range`` / ``numpy.ndarray``). Any other type (notably
``xarray.DataArray``, but also ``pd.Series`` and friends) now
raises ``TypeError`` with a hint to pass ``variable.indexes[<dim>]``
instead. This would have caught the SOS-reformulator bug at the
source instead of letting it surface as a confusing xarray error
about mismatched dim counts ten frames down.

Drop ``DataArray`` from the matching ``coords`` type hints in
``model.py`` / ``expressions.py`` so the documented and runtime
type sets agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(common): proper MultiIndex support in coords helpers (#729)

- _coords_to_dict: explicitly handle pd.MultiIndex — register under
  .name if set, raise TypeError with guidance if .name is missing
- _named_pandas_to_dataarray: use DataArray(df) directly for
  single-level DataFrames; reserve stack() for MultiIndex axes
- as_dataarray_in_coords: validate MultiIndex dims with .equals()
  instead of silently skipping them
- Move MultiIndex tests into dedicated TestAddVariablesMultiIndexCoords
  class with shared fixture

* fix: apply coords-as-truth rule to mask in add_variables/add_constraints (#725)

* fix(model): apply coords-as-truth rule to mask in add_variables/add_constraints

Routes ``mask`` through ``as_dataarray_in_coords(mask, data.coords)``
instead of ``as_dataarray(...) + broadcast_mask(...)``, so pandas
``Series`` / ``DataFrame`` masks missing a dimension are broadcast
to the variable / constraint shape (parallel to the bounds fix in
the previous PR). The ``add_variables`` ``mask`` type hint widens
to ``MaskLike`` to match ``add_constraints``.

The deprecation announced via ``FutureWarning`` in ``broadcast_mask``
("Missing values will be filled with False ... In a future version,
this will raise an error") is now in effect: masks whose
coordinates are a sparse subset of the data's coordinates raise
``ValueError`` instead of silently filling missing entries.
Mask dims not in the data raise ``ValueError`` instead of
``AssertionError`` for consistency with the bounds path.

``broadcast_mask`` had no other callers and is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update doc/release_notes.rst

Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>

* refactor: unify as_dataarray; split broadcasting from coords validation (#726)

* fix(model): apply coords-as-truth rule to mask in add_variables/add_constraints

Routes ``mask`` through ``as_dataarray_in_coords(mask, data.coords)``
instead of ``as_dataarray(...) + broadcast_mask(...)``, so pandas
``Series`` / ``DataFrame`` masks missing a dimension are broadcast
to the variable / constraint shape (parallel to the bounds fix in
the previous PR). The ``add_variables`` ``mask`` type hint widens
to ``MaskLike`` to match ``add_constraints``.

The deprecation announced via ``FutureWarning`` in ``broadcast_mask``
("Missing values will be filled with False ... In a future version,
this will raise an error") is now in effect: masks whose
coordinates are a sparse subset of the data's coordinates raise
``ValueError`` instead of silently filling missing entries.
Mask dims not in the data raise ``ValueError`` instead of
``AssertionError`` for consistency with the bounds path.

``broadcast_mask`` had no other callers and is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: unify as_dataarray; split broadcasting from coords validation

Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray`
and extracts the contract checks into `assert_compatible_with_coords`, so
linopy now has one broadcasting primitive and one validation companion.

`as_dataarray(arr, coords)` aligns the result against `coords` for every
input type: labels positional inputs (numpy / unnamed pandas / scalar) by
position, reindexes same-values-different-order, expands missing dims,
and transposes to coords order. Extra dims and disagreeing value sets on
shared dims pass through unchanged, so xarray broadcasting in expression
arithmetic keeps working.

`assert_compatible_with_coords(arr, coords)` enforces the strict contract
(`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared
dims). `add_variables` and `add_constraints` now call it after
`as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted
`as_dataarray_in_coords` helper.

`_coords_to_dict` filters MultiIndex level coords out of
`xarray.Coordinates` inputs so the new strict-by-default path treats
`station` (and not its derived `letter` / `num` levels) as the dim.

Test suite: 3698 passed (no regressions). Two existing tests were
updated to reflect the new "coords is source of truth" semantics:
`test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned`
(extra coord entries now broadcast in) and
`test_dataarray_extra_dims` (now triggers the subset check rather than
the value-mismatch check).

Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat
timings vs the base branch on both add_variables-heavy and arithmetic-
heavy workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: dims= names unnamed coords; doctest the add_variables contract

Closes a silent-failure gap in the strict coords-as-truth path: when the
caller passed ``coords=[[1, 2, 3]], dims=["x"]`` to ``add_variables``,
``_coords_to_dict`` returned an empty mapping (unnamed sequences carry
no dim name), so the strict checks short-circuited and bounds with
extra dims or mismatched values flowed through unchecked, producing
variables with frankenstein outer-joined coord values.

``_coords_to_dict`` now accepts an optional ``dims`` argument that
names unnamed sequence entries by position. ``as_dataarray`` and
``assert_compatible_with_coords`` plumb it through; ``add_variables``
forwards ``kwargs.get("dims")`` to the assertions for ``lower`` and
``upper``. ``coords=[[1, 2, 3]], dims=["x"]`` now enforces the same
contract as ``coords={"x": [1, 2, 3]}`` or
``coords=[pd.Index([1, 2, 3], name="x")]``.

Docstring of ``add_variables.coords`` documents the contract
(subset-of-dims, dim order, value match with auto-reindex, missing-dim
broadcast) and includes four doctests pinning it: the extra-dim raise,
the value-mismatch raise, the same-values-different-order auto-reindex,
and the unnamed-coords-plus-dims opt-in.

Test suite: 3698 passed (parity with the previous commit on this
branch). ``pytest --doctest-modules linopy/model.py -k add_variables``
also green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add align_to_coords with semantic validation error messages

Introduce align_to_coords to wrap as_dataarray and assert_compatible_with_coords
with user-facing labels (lower bound, upper bound, mask). Errors now name the
argument and distinguish extra dimensions, coordinate mismatches, and conversion
failures. Extend mask validation to use coords+dims= when provided.

Co-authored-by: Cursor <cursoragent@cursor.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(model): simplify mask align; preserve TypeError in align_to_coords

Three cleanups on top of align_to_coords:

- Drop the trailing ``.broadcast_like(data.labels)`` in ``add_variables``
  and ``add_constraints`` mask paths. ``as_dataarray`` already expands
  missing dims to ``coords`` shape, so the broadcast was a no-op.
- Stop overriding the caller's ``dims=`` in the ``add_variables`` mask
  path when ``coords is None``. The previous code stripped ``dims`` and
  forced ``dims=data.dims``; with ``data.coords`` being an xarray
  ``Coordinates`` with already-named dims, the user's ``dims`` is
  harmless to forward and the override was just hiding intent. Mask
  now goes through one ``align_to_coords`` call regardless of whether
  ``coords`` is supplied.
- Split the exception handler in ``align_to_coords``: ``TypeError`` from
  unsupported input types is re-raised as ``TypeError`` (still labeled),
  while ``ValueError`` / ``CoordinateValidationError`` stay
  ``ValueError``. Preserves the original type signature for callers
  that want to ``except TypeError``.

New test ``test_align_to_coords_preserves_type_errors`` pins the
TypeError pass-through. Suite: 3703 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename assert_compatible_with_coords to validate_alignment

Per PR review: align on the project's `validate_*` naming convention
and remove the implicit "AssertionError" connotation of `assert_*`.
Pairs naturally with `align_to_coords`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* test(repr): set .name on MultiIndex coord

#729 made `.name` required on `pd.MultiIndex` sequence-form coord entries
(xarray needs a single dim name for the flattened index). test_repr.py was
the only remaining call site missing the assignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(types): widen _coords_to_dict to Hashable; sort with key=str

`xarray.Coordinates.dims` is typed `Hashable`, so the dict-comprehension
return and the `sorted()` calls in the validation message tripped mypy.
The function's other branches already accept `c.name` / `dim_names[i]`
(both Hashable), so widening the return type is the honest signature.

Also: drop `.data` from the add_variables doctest — use the public
`v.lower` property instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(common): clarify coords-entry rules and tighten error labels (#733)

* refactor(common): clarify coords-entry rules and tighten error labels

Stacks on top of #732. Three small follow-ups from PR review:

- Remove dead `broadcast_mask` (claimed removed in #732, was still present).
- `as_dataarray`: normalize bare-tuple coord entries to lists so
  `coords=[(0, 1, 2)]` behaves identically to `coords=[[0, 1, 2]]`
  (xarray reads `(a, b)` as `(dim_name, values)` and would otherwise
  raise a confusing error).
- `align_to_coords`: pre-validate coords via `_coords_to_dict` so
  TypeErrors from a bad `coords` argument propagate with their own
  message instead of being relabeled "<label> could not be aligned to
  coords: ...", which previously misdirected users to inspect the
  bound/mask.

Docs: replace the prose paragraph in `_coords_to_dict`'s docstring
with an explicit rules table covering every container form and
sequence-entry case (named/unnamed `pd.Index`, `pd.MultiIndex`, bare
sequences, with/without positional `dims=`).

Tests: new `TestCoordsToDictRules` class in `test_common.py` mirrors
the docstring table one-test-per-rule so the executable spec stays
visibly aligned with the documented contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(common): allow dims= to name an unnamed pd.MultiIndex

Mirrors the existing rule for unnamed pd.Index: an unnamed MultiIndex
paired with a positional dims=[i] entry now gets its flat .name set
to dims[i] on a shallow copy (caller's MultiIndex is not mutated).
Per-level names are preserved.

Removes the asymmetry between Index and MultiIndex in _coords_to_dict:
both can now be named either inline (.name) or by position (dims=[i]).
An unnamed MultiIndex with no positional dims still raises TypeError
since xarray requires a single flat name.

Adds one rule-table row and two tests
(test_unnamed_multiindex_with_dims_uses_dims,
test_unnamed_multiindex_without_dims_raises).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(common): scope tuple-normalize check to lists/tuples with tuple entries

The previous `not isinstance(coords, Coordinates | Mapping)` form was
broad and rebuilt `coords` as a fresh list on every call (even when no
tuple entries were present). Switch to a positive
`isinstance(coords, list | tuple)` guard with a short-circuit
`any(isinstance(c, tuple) for c in coords)` check, so the comprehension
only runs when there is actually a tuple to normalize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(release_notes): restore lost bullets, surface coords breaking changes

Two pre-existing Upcoming-Version bullets from master had been
dropped on this branch, most likely as merge-conflict casualties:

- ``LinearExpression.where`` doc + ``BaseExpression.variable_names`` entry
- Mosek basic/IPM solution-inspect fix

Restore both verbatim from master.

Also add an explicit Breaking Changes bullet for the coord-as-truth
behaviour changes that previously lived only under Bug Fixes: the
mask FutureWarning -> ValueError flip, the AssertionError -> ValueError
flip on extra mask dims, and the new TypeError on an unnamed
pd.MultiIndex without a positional dims=[i] entry. The Bug Fixes
entries still carry the migration detail; the Breaking Changes bullet
points there so readers scanning by section don't miss the rename of
warnings to errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(release_notes): condense coords-as-truth entries

- Merge the two Bug Fixes bullets (bounds + mask) into one. The
  separation read as "same fix, applied twice" without adding info;
  one bullet covers both with the same migration detail.
- Shorten the Breaking Changes bullet — it duplicated the v0.6.3
  ``FutureWarning`` and ``AssertionError`` parentheticals already
  in Bug Fixes; keep only the FutureWarning summary and the
  pd.MultiIndex addition.
- Collapse the Internal as_dataarray bullet from 8 wrapped lines to
  one, and drop the "Validation errors name the argument" UX
  detail — accurate but not structural enough for a release note.

No facts removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(common): preserve MultiIndex levels when broadcasting a missing dim

as_dataarray used expand_dims to add a coords dim absent from the input,
which silently dropped MultiIndex level coords and left a degenerate flat
index that failed to align downstream (PyPSA multi-investment). Broadcast
MultiIndex-backed dims against a Coordinates template instead, falling back
to expand_dims when the input already carries a level name as its own coord.

Also narrow CoordsLike to drop the DataArray sequence entry (rejected by
_coords_to_dict), and give align_to_coords an explicit dims parameter.

* feat(common): project pandas inputs onto stacked-MultiIndex coords dims

Map arr dims that name levels of a stacked-MultiIndex coords dim onto
that dim: a level subset broadcasts, the full set aligns element-wise.
Strict callers (add_variables/add_constraints) enforce full coverage;
arithmetic keeps NaN-filling. Fixes PyPSA multi-investment regressions.

* feat(common): warn on implicit MultiIndex-level projection in arithmetic

The level-projection result is already convention-shaped (levels stay as
aux coords on the MI dim). On the arithmetic path, flag the cases the v1
arithmetic convention will require to be explicit — subset-level broadcast
and NaN-fill of uncovered entries — with EvolvingAPIWarning. Full-level,
full-coverage alignment and the strict bounds path stay silent.

* use lax dataarray in matmul

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(common): split DataArray conversion into a 3-rung strictness ladder (#737)

* refactor(common): split DataArray conversion into a 3-rung strictness ladder

Replace the as_dataarray + _as_dataarray_lax pair (and the
enforce_level_coverage flag) with three public entry points, each
including the previous one:

- as_dataarray: convert only (the former _as_dataarray_lax). Used by
  __matmul__, where dims missing from the constant must not be
  broadcast in (they would be contracted away as common dims).
- broadcast_to_coords: convert + broadcast against coords (the former
  broadcasting as_dataarray). Used by expression arithmetic.
- align_to_coords: convert + broadcast + enforce the coords contract.
  Used by add_variables / add_constraints (unchanged signature).

The broadcasting mechanics live in one shared private core
(_broadcast_core) that reports MultiIndex-level projections instead of
applying policy. The entry points decide what a partial projection or
coverage gap means: broadcast_to_coords warns (arithmetic convention),
align_to_coords raises (coords contract). This removes the
enforce_level_coverage flag and keeps validation concerns out of the
broadcasting layer.

No behavior changes; all call sites keep their semantics. New tests pin
the ladder contrasts and the matmul dim-contraction rules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: shorten release-notes bullet on conversion helpers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(common): rename _broadcast_core to _broadcast_to_coords

Private-twin convention: _broadcast_to_coords is the raw implementation
of broadcast_to_coords (returns projection events instead of applying
policy), shared with align_to_coords.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(expressions): as_expression converts constants with the broadcast rung

The constraint lhs/rhs setters call as_expression(value, model,
coords=self.coords, dims=self.coord_dims); forwarding those kwargs to
the convert-only as_dataarray dropped the broadcasting these setters
relied on (e.g. a MultiIndex-level-indexed rhs failed with an xarray
AlignmentError instead of being projected onto the stacked dim).

Use broadcast_to_coords instead. The other as_expression callers pass
only dims (no coords), for which both rungs behave identically.

Adds regression tests for the rhs setter: missing-dim broadcast and
MultiIndex-level projection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(common): rename align_to_coords to strict_broadcast_to_coords

The function never aligns anything — it broadcasts and raises on any
mismatch it cannot resolve by broadcasting alone. "Align" is also the
word that invites join= proposals (aligns take joins, broadcasts do
not), so the name now states what it is: the same broadcast as
broadcast_to_coords with a strict failure mode (zip(strict=True)
semantics).

Error messages keep the "could not be aligned to coords" wording so
tests in the base branch (#732) stay untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(common): apply review polish to the strictness ladder

- Document the one non-obvious policy in strict_broadcast_to_coords:
  partial-level broadcasts are silent (bounds-broadcast feature), unlike
  the warning on the broadcast rung.
- Unify the first parameter name across the ladder (value -> arr).
- Un-invert the warning-policy loop in broadcast_to_coords.
- Rename the test whose name forced an awkward signature wrap to a
  behavior-oriented name (test_extra_dims_pass_broadcast_rung_fail_strict_rung).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(common): add numpydoc Parameters/Returns to the three public rungs

Parameter entries carry descriptions only — types live in the function
signatures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(common): unify the broadcast rungs into broadcast_to_coords(strict=...)

Per review discussion: one public function instead of two, with strict as
a keyword flag.

- strict=True (default): any mismatch with coords raises, naming label in
  the error — the former strict_broadcast_to_coords.
- strict=False: mismatches pass through for downstream xarray alignment —
  the former loose broadcast_to_coords, used by arithmetic.

Strict is the default so that forgetting the flag adds safety rather than
silently dropping validation. MI handling preserved exactly per mode
(strict: silent partial / raise on gap; non-strict: EvolvingAPIWarning) —
the scenario-B deprecation warnings land separately in #732.

Call sites: model.py bounds/mask drop the long name (strict is default);
arithmetic and as_expression pass strict=False explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(common): require label when broadcast_to_coords is strict

Restores the contract align_to_coords always had: strict-mode errors must
name their subject ("lower bound could not be aligned..." rather than
"Value could not be aligned..."). Enforced both statically (overloads:
strict=True requires label: str, strict=False forbids it) and at runtime
(TypeError).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(common): deprecate implicit MI-level projection everywhere (scenario B)

Per the #737 review discussion and Fabian's decision: implicit level
projection is deprecated and will raise under the v1 convention, so the
EvolvingAPIWarning now fires in both modes of broadcast_to_coords — the
MI check is the same for every use case:

- input missing a whole level: warn (strict and non-strict)
- coverage gap (level combinations without a value): warn (non-strict) /
  raise (strict — no downstream layer to defer the NaN to)

Warning emission lives in one helper, _warn_implicit_projections, with a
TODO(#738) to migrate to LinopySemanticsWarning once #717 lands.

Also clarifies the MultiIndex terminology everywhere: an MI dim has
*levels* and *level combinations* (one tuple per position). Docstrings
carry the glossary, the coverage-gap error names the missing
combinations explicitly, and "entry" is gone from messages.

User-facing: add_variables / add_constraints with per-period-style
bounds now emit the deprecation warning (PyPSA multi-investment).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(release_notes): surface the MI-projection deprecation and DataArray-coords breaking change

- The implicit MultiIndex-level projection deprecation (scenario B) now
  has its own entry under Deprecations, where PyPSA users scanning for
  upcoming warnings will find it.
- Breaking Changes gains the CoordsLike narrowing: DataArray entries in
  sequence-form coords raise TypeError (pass variable.indexes[dim]
  instead).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(common): reject unnamed-MultiIndex inputs in strict validation

validate_alignment unwrapped only bare pd.MultiIndex coord entries, so
Coordinates-backed (DataArray) MI dims read as non-MI and skipped the
equality check. Use _as_multiindex on both sides to catch mismatches
regardless of level names.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
FBumann added a commit that referenced this pull request Jun 3, 2026
…cenario B v1)

Closes the two integration points between the alignment layer (#732/#742)
and the v1 semantics infrastructure:

- _warn_implicit_projections -> _enforce_implicit_projections: under
  legacy semantics, the implicit MultiIndex-level projection deprecation
  now goes through warn_legacy() / LinopySemanticsWarning (#738,
  replacing EvolvingAPIWarning, which stays piecewise-only); under the
  v1 convention it raises ValueError (sections 8 and 11) — the
  projection must be written explicitly (scenario B of the #732/#737
  discussion).

- as_expression no longer swallows the underlying conversion error:
  "Cannot convert to LinearExpression: <original message>" so the v1
  guidance reaches the user.

Tests: the MI-projection deprecation tests in test_alignment,
test_variable, test_constraint, and test_linear_expression are marked
@pytest.mark.legacy and assert LinopySemanticsWarning; each gains a
@pytest.mark.v1 counterpart asserting the v1 raise. Full suite under
both semantics: 6446 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FBumann added a commit that referenced this pull request Jun 3, 2026
The convention was silent on inputs indexed by levels of a stacked
MultiIndex dimension — the question resolved as scenario B in the
#732/#737 discussion. Now written into section 11:

- level coords are auxiliary coordinates, so a level-named operand dim
  is a section-11 conflict: it raises, with the explicit .sel()
  projection as the documented recipe;
- a full reconstruction of the MultiIndex is not a conflict (same
  coordinate spelled differently, aligns under section 8);
- legacy projects implicitly and warns; the projection is removed
  at 1.0 (added to legacy-removal.md).

Also: the #736 TODO no longer claims #732 is unmerged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FBumann added a commit that referenced this pull request Jun 3, 2026
…ts operand

The master merge updated _add_constant_legacy to
broadcast_to_coords(strict=False) (it was in the conflict region) but
_add_constant_v1 was added by this branch in a region master never
touched, so it kept calling as_dataarray — which since #737 is
convert-only and no longer broadcasts. A v1 addition of an array
operand needing dim expansion failed with "conflicting sizes".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants