Skip to content

feature/vector_filter_basis#1018

Draft
mcgibbon wants to merge 40 commits intofeature/disco_forkfrom
feature/vector_filter_basis
Draft

feature/vector_filter_basis#1018
mcgibbon wants to merge 40 commits intofeature/disco_forkfrom
feature/vector_filter_basis

Conversation

@mcgibbon
Copy link
Copy Markdown
Contributor

Opening this draft PR to discuss with Claude.

Changes:

  • symbol (e.g. fme.core.my_function) or script and concise description of changes or added feature

  • Can group multiple related symbols on a single bullet

  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Resolves # (delete if none)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread vector_filter_basis.md Outdated

## Overview

This document specifies an approach for handling directional (vector-valued) data in DISCO convolution on the sphere. Rather than constructing a continuous directional filter basis (as in the cube-gauged approach), this design uses **isotropic radial filters** combined with a **frame-rotation matrix** that correctly transforms vector inputs from each input point's local meridian frame into the output point's meridian frame.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filters the design uses aren't isotropic, right? They will have an angular dependence. Or are they rotationally invariant, just they properly handle vector inputs and outputs?

I think it is a good idea that the physics be rotationally invariant, in the sense that for example if there is a convergence term for advection in x depending on u and the x-oriented filter, there must also be a corresponding convergence term in y depending on v and the y-oriented filter. Similarly for rotation-like filters. Is there a way to capture this? Describe how such filters would be represented and how it results in this desired behavior. Would it mean we store one set of filters that gets applied in some coherent way to both the x- and y- axis, using a rotation operator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: You're right — the effective filters are not isotropic. The radial basis functions ψ_k(r) are isotropic (depend only on geodesic distance), but the full filter tensors include angular dependence through cos(γ) and sin(γ), where γ is the frame rotation angle. So the filter tensor ψ_k(r)·cos(γ) has a dipole-like angular pattern — it picks out the component of the input aligned with one direction, while ψ_k(r)·sin(γ) picks out the perpendicular component.

The rotational invariance comes from how these filters are used in matched pairs. Yes — we store one set of radial basis functions ψ_k(r), and from it derive two filter tensors (psi_cos and psi_sin) that are always applied together via the rotation matrix structure. For example:

  • Scalar→vector (gradient-like): Applying psi_cos to a scalar field s gives the u-component, and psi_sin gives the v-component. Because both use the same ψ_k(r) and the angular factors cos(γ)/sin(γ) are geometrically determined (not learned), this automatically ensures that d/dx and d/dy are locked together — you can't have convergence in x without the corresponding convergence in y.

  • Vector→vector (frame rotation): The same psi_cos/psi_sin pair forms the 2×2 rotation matrix [cos γ, −sin γ; sin γ, cos γ] that transforms input vectors into the output frame. Again, one set of filters handles both components coherently.

  • Vector→scalar (divergence/curl-like): The cos/sin filters contract vector components into rotationally invariant scalars (div-like: cos·u + sin·v, curl-like: cos·v − sin·u).

The learned weights are shared between the u and v pathways — the weight can't independently scale the x vs y contribution, so the network can't break the rotational symmetry. I'll revise the document to clarify this and stop calling the filters "isotropic."

Comment thread vector_filter_basis.md Outdated
Comment on lines +15 to +17
- **Scalar channels** `s(lat, lon)`: Frame-independent quantities (temperature, pressure, geopotential, etc.). No special treatment needed in convolution.
- **u channels** `u(lat, lon)`: East component of a vector in the local meridian frame (eastward wind, etc.).
- **v channels** `v(lat, lon)`: North component of a vector in the local meridian frame (northward wind, etc.).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the number of u and v channels are the same, but the number of scalar channels is independent of the number of vector channels. The vectors should be stored as a single tensor with a final [2] dimension.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Agreed. Forward signature will be forward(x_scalar, x_vector) where x_scalar: (B, N_s, H, W) and x_vector: (B, N_v, H, W, 2) with the last dimension being (u, v). Output follows the same convention. Will update the document.

Comment thread vector_filter_basis.md Outdated
[v_rotated] = [sin γ cos γ] [v_in]
```

where `(u_in, v_in)` are the input vector components in the input's meridian frame, and `(u_rotated, v_rotated)` are the same vector expressed in the output's meridian frame.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this rotation is represented by being stored within the pre-computed filter itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Yes — cos(γ) and sin(γ) are multiplied into ψ_k(r) during precomputation to produce psi_cos_fft and psi_sin_fft. No runtime rotation needed.

Comment thread vector_filter_basis.md Outdated
Comment on lines +176 to +185
- **Vector norm → scalar:** `n = sqrt(u² + v²)` is frame-invariant and produces a true scalar.
- **Norm-gated vectors:** `σ(n) · (u, v) / n` applies a nonlinearity to the vector magnitude while preserving direction.
- **Scalar × vector → vector:** `s · (u, v)` multiplies a scalar channel with a vector pair to produce a new vector pair. This is how the network creates new vector features from scalar-vector interactions.
- **Vector dot product → scalar:** `u₁·u₂ + v₁·v₂` produces a frame-invariant scalar from two vector pairs.
- **2D cross product → scalar:** `u₁·v₂ − u₂·v₁` produces a pseudo-scalar (changes sign under reflection but is frame-rotation invariant).

### Unsafe Nonlinearities

- **Independent nonlinearities on u and v:** Applying `ReLU(u)`, `ReLU(v)` independently breaks rotational structure — the result depends on the frame orientation.
- **Adding scalar to vector component:** `u + s` breaks vector transformation rules.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only allow nonlinearities to be applied to the scalars. Don't apply nonlinearities to the vector channels. This should still be fine, as the vectors can have nonlinear behavior by virtue of their dependence on the scalar channels, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Correct. Vectors gain nonlinear behavior through the scalar pathway: scalars get nonlinearities applied, then the next layer's scalar→vector filters (psi_cos/psi_sin) produce new vector features that depend nonlinearly on the original inputs. This avoids frame-dependent nonlinearity issues entirely. I'll replace the nonlinearity section with a note about this constraint.

Comment thread vector_filter_basis.md Outdated

This approach is different in character:

- **No angular filter modes.** The filters remain isotropic (radial only). Directional information enters through the vector channels, not through the filter shape.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this some more. The filters won't be purely isotropic, right? We need to take d/dx of scalar fields, it's just that this operation will be handled in a way that aligns it with the u component, while d/dy would be aligned with the v component of the vectors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Right — as discussed in the thread on line 5, the effective filters have angular dependence via cos(γ)/sin(γ). For scalar→vector specifically: psi_cos applied to a scalar gives the d/dx-like component (→ u), psi_sin gives d/dy-like (→ v). Same ψ_k(r) underlies both, geometrically locked together. I'll replace "isotropic" throughout.

Comment thread vector_filter_basis.md Outdated
- **N_s, N_v (channel counts):** Number of scalar and vector channel pairs in the hidden representation. The network architecture determines these. A reasonable starting point is to match the current total channel count, reserving some for vector pairs.
- **Nonlinearity design:** The choice of how to combine scalar and vector channels between layers (norm-gating, scalar-vector products, etc.) is an architectural decision. Simple norm-gating plus scalar×vector products provide the essential interactions.

## Summary of Implementation Steps
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also outline the testing plan for this new feature? Especially testing that various properties of certain operations are satisfied using user-story level tests of that behavior. It is hard to fully understand how the mathematical details lead to these properties, so it would help to have this also proven out by tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Proposed testing plan (user-story-level behavioral tests):

  1. Uniform vector field is preserved. Create a constant vector field (e.g., everywhere pointing east). After vector→vector convolution, the output should still be uniform and pointing east — frame rotations cancel for a constant field.

  2. Longitude shift equivariance. Shift scalar and vector inputs by N longitude points. Verify outputs shift by N points (both scalar and vector channels). Tests that the FFT-based implementation preserves translational symmetry.

  3. Scalar gradient produces correct vector. Apply scalar→vector to a scalar field with a known gradient (e.g., linear ramp in longitude → expect u output, zero v). Verify the output vector direction matches the gradient direction.

  4. Divergence recovery. Create a radially diverging vector field from a point. Apply vector→scalar with divergence-like weights. Verify the output scalar is positive at the source.

  5. Curl recovery. Create a rotating vector field. Apply vector→scalar with curl-like weights. Verify the result reflects the rotation.

  6. Rotational invariance. Rotate input fields (shift longitude), verify that scalar outputs are identical and vector outputs rotate consistently.

  7. Scalar-only consistency. When N_v=0, verify the module produces identical results to the existing DiscreteContinuousConvS2.

  8. Regression baselines. Save reference outputs from known inputs/weights, verify they don't change across refactors (using validate_tensor_dict).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your "longitude shift" and "rotational" tests seem to be the same test? Consider also doing a test that involves rotating the north pole somewhere else. The output field should be rotated the same way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: You're right, those were redundant. I'll merge them and add a proper full-sphere rotation test. Updated plan in the document:

  • Longitude shift equivariance: shift inputs by N longitude points, verify outputs shift identically.
  • Arbitrary rotation equivariance: rotate the entire input field (move the north pole elsewhere), verify the output rotates consistently — scalar outputs match under the rotation, vector outputs transform correctly into the new frame.

The arbitrary rotation test is stronger because longitude shifts are already guaranteed by the FFT structure, while a pole-moving rotation exercises the frame rotation geometry at every latitude.

Comment thread vector_filter_basis.md Outdated
Comment on lines +162 to +165
- **Scalar output from scalar input:** Standard — weight contracts over `(c_in, k)`.
- **Scalar output from vector input:** Weight contracts over `(c_in, k)` separately for `conv_u` and `conv_v`, producing a scalar from vector components (analogous to divergence or a directional projection).
- **Vector output from vector input:** Weight contracts over `(c_in, k)` for `conv_u` and `conv_v` to produce output `u` and `v` components. The weight can independently scale and mix the u/v components.
- **Vector output from scalar input:** The weight assigns a scalar intermediate feature to a vector output channel. This implicitly creates a preferred direction in the meridian frame, and is therefore not rotationally equivariant. But if `u_out` and `v_out` channels are created from the same scalar feature with appropriate weights, the network can learn latitude-dependent directional structures (e.g., patterns aligned with Coriolis deflection).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this weight defined? It should correspond to rotationally-invariant (but not isotropic) physics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: The weight has separate blocks per type interaction, with u/v sharing weights to enforce rotational invariance. Each interaction that involves vectors has two independent weight components (analogous to a complex weight: one "real" and one "imaginary" or equivalently one "stretch" and one "90°-rotation"):

  • W_ss (N_s_out, N_s_in, K): scalar→scalar via psi_scalar. Standard, 1 weight.
  • W_sv (N_v_out, N_s_in, K, 2): scalar→vector. Component 0 ("gradient"): u = w·contraction(psi_cos, s), v = w·contraction(psi_sin, s). Component 1 ("perp gradient"): u = w·(-contraction(psi_sin, s)), v = w·contraction(psi_cos, s). The perpendicular gradient is physically meaningful — e.g. geostrophic wind from pressure.
  • W_vs (N_s_out, N_v_in, K, 2): vector→scalar. Component 0 ("divergence"): s = w·(psi_cos·u + psi_sin·v). Component 1 ("curl"): s = w·(psi_cos·v − psi_sin·u).
  • W_vv (N_v_out, N_v_in, K, 2): vector→vector. Component 0 ("stretch"): same weight on both components of frame-rotated vector. Component 1 ("rotate"): applies 90° rotation (-conv_v, conv_u) with shared weight.

Key constraint: within each block, u and v pathways share the same scalar weight, so the network can't break rotational symmetry. The 2-component structure for vector interactions captures the full space of rotationally-invariant linear maps (divergence+curl, gradient+perp-gradient, stretch+rotation).

Comment thread vector_filter_basis.md Outdated
Comment on lines +162 to +165
- **Scalar output from scalar input:** Standard — weight contracts over `(c_in, k)`.
- **Scalar output from vector input:** Weight contracts over `(c_in, k)` separately for `conv_u` and `conv_v`, producing a scalar from vector components (analogous to divergence or a directional projection).
- **Vector output from vector input:** Weight contracts over `(c_in, k)` for `conv_u` and `conv_v` to produce output `u` and `v` components. The weight can independently scale and mix the u/v components.
- **Vector output from scalar input:** The weight assigns a scalar intermediate feature to a vector output channel. This implicitly creates a preferred direction in the meridian frame, and is therefore not rotationally equivariant. But if `u_out` and `v_out` channels are created from the same scalar feature with appropriate weights, the network can learn latitude-dependent directional structures (e.g., patterns aligned with Coriolis deflection).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectors should not be assigned values directly from scalar inputs, rather they should only be assigned values that correspond to specific filters (which may have an orientation or be isotropic) being applied to the scalars, and then that value being mapped into the vectors. I think this is what you mean, but it's not entirely clear. Requiring this mapping be rotationally invariant means that the isotropic filter would be assigned into the u and v components with exactly the same contribution - there's no directional dependence.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Agreed. In the updated design, scalar→vector exclusively uses the oriented filters: u = W_sv · contraction(psi_cos, s), v = W_sv · contraction(psi_sin, s). The directional structure comes from the geometry (cos γ, sin γ), not the weights. There is no path from psi_scalar to vector outputs — an isotropic filter applied to a scalar carries no directional information and can't meaningfully populate a vector channel.

mcgibbon and others added 15 commits March 26, 2026 19:36
Key changes to the design document:
- Replace "isotropic filter" framing: filters have angular dependence
  via cos(γ)/sin(γ), making them rotationally invariant but not isotropic
- Vector channels stored as (B, N_v, H, W, 2) tensor with (u,v) last dim
- Weight structure enforces rotational invariance: four separate blocks
  (W_ss, W_sv, W_vs, W_vv) with shared u/v weights; vector interactions
  have 2 components each (gradient/perp-grad, div/curl, stretch/rotate)
- No direct psi_scalar→vector path; scalar→vector uses oriented filters
- Nonlinearities only on scalar channels; vectors gain nonlinearity
  indirectly through the scalar pathway
- Add module interface (VectorDiscoConvS2) with explicit forward signature
- Add testing plan with 8 behavioral tests including arbitrary rotation
  equivariance
- Update implementation steps for new file organization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New file fme/core/disco/_vector_convolution.py with:
- _precompute_vector_convolution_tensor_s2: extends the existing
  precomputation to compute cos(γ) and sin(γ) (frame rotation angle)
  at each filter support point, producing three sets of filter values
  (scalar, cos-modulated, sin-modulated) with shared normalization
- build_vector_psi_fft: orchestrates the full pipeline from
  precomputation through banded FFT construction, reusing the scalar
  banding parameters for cos/sin tensors

Tests verify:
- Scalar values and indices match existing _precompute_convolution_tensor_s2
- Pythagorean identity cos²(γ) + sin²(γ) = 1 at all support points
- Zero frame rotation on the same meridian (Δlon = 0)
- sin(γ) antisymmetry under Δlon → -Δlon
- Scalar FFT tensor matches the existing pipeline end-to-end
- Correct shapes and compatibility with multiple filter basis types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New nn.Module in fme/core/disco/_vector_convolution.py:
- Forward pass: 5 FFT contraction calls (3 on scalar inputs, 2 on
  vector inputs), assembles rotationally invariant intermediates
  (frame-rotated vector, divergence, curl), then applies 4 weight blocks
- Weight structure: W_ss (scalar→scalar), W_sv (scalar→vector with
  gradient + perpendicular gradient), W_vs (vector→scalar with
  divergence + curl), W_vv (vector→vector with stretch + 90° rotation)
- Each vector weight block has 2 components sharing the same scalar
  weight for u and v pathways, enforcing rotational invariance
- Bias on scalar outputs only (vector bias would break symmetry)
- Handles zero-channel edge cases (N_s=0 or N_v=0)
- Exported as VectorDiscoConvS2 from fme.core.disco

Tests added:
- Forward shape for various channel configurations
- Zero-channel edge cases (scalar-only, vector-only, mixed)
- Scalar-only output matches existing DiscreteContinuousConvS2
- Longitude shift equivariance for both scalar and vector channels
- R_x(pi) rotation equivariance for scalar→scalar and vector→vector
- Gradient flow through all four weight blocks
- Scalar gradient direction at equator
- Divergence and curl detection from vector fields
- Bias correctness (affects only scalar outputs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend the vector filter precomputation tests to verify that the two
components of the vector filter (cos(γ) and sin(γ)) are perpendicular
for every supported filter basis type:

- Pythagorean identity (cos²+sin²=1): parametrized over piecewise
  linear, morlet, and zernike bases. Verifies pointwise perpendicularity.
- Weighted orthogonality: for each (kernel, output latitude), verifies
  the inner product of cos(γ) and sin(γ) patterns over the filter
  support is near zero. Tests that the two components are orthogonal
  as functions, not just pointwise unit vectors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New abstraction that pairs two FilterBasis instances: one for isotropic
operations (scalar->scalar, vector->vector) and one for oriented
operations (scalar->vector, vector->scalar). The two bases can have
different kernel shapes, allowing independent radial resolution.

Includes factory function get_vector_filter_basis and exports from
fme.core.disco.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The module now uses independent scalar and vector filter bases with
potentially different kernel sizes (K_s, K_v):

- build_vector_psi_fft evaluates both bases, producing 5 FFT tensors
  (psi_scalar, psi_s_cos, psi_s_sin from scalar basis; psi_v_cos,
  psi_v_sin from vector basis) with separate gather indices
- Forward pass uses 7 contraction calls when K_s != K_v (collapses
  to 5 when bases are identical, via the precomputation cache)
- Weight shapes: W_ss (K_s), W_vv (K_s, 2), W_sv (K_v, 2), W_vs (K_v, 2)
- Weight init accounts for mixed K_s/K_v fan-in per output type
- Constructor accepts VectorFilterBasis or kernel_shape (backward compat)

Tests updated:
- build_vector_psi_fft tests use VectorFilterBasis
- New test for different K_s/K_v shapes and weight dimensions
- New test for FFT shapes with different bases
- All existing behavioral tests pass unchanged via kernel_shape compat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite the design document as a standalone reference for the
VectorDiscoConvS2 module:

- Lead with scalar vs vector filter type distinction and the
  input×filter→output type table
- Describe VectorFilterBasis with independent K_s and K_v
- Document all 5 precomputed FFT tensors (3 scalar-basis, 2 vector-basis)
  with separate banding and gather indices
- Document 7 contraction calls (collapsing to 5 when K_s = K_v)
- Document symmetry properties: exact longitude equivariance, R_x(pi)
  equivariance for ss/vv but not cross-type paths
- Update module interface to show VectorFilterBasis constructor
- Update testing plan to match implemented tests
- Remove references to "existing" code or previous document versions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iance

The cross-type paths (sv, vs) now use the bearing angle φ and frame
correction β instead of the frame rotation γ for their angular
modulation. This makes all four paths equivariant under R_x(π):

- sv path: uses cos(φ)/sin(φ) (bearing at output) instead of cos(γ)/sin(γ)
- vs path: uses cos(β)/sin(β), equivalent to frame-rotating the input
  vector to the output frame then projecting onto the bearing direction
- vv path: unchanged, still uses cos(γ)/sin(γ) for frame rotation
- ss path: unchanged, isotropic

The precomputation now stores all three angular pairs (φ, β, γ) at
each support point. At θ≈0 (input coincides with output), the bearing
direction is undefined, so cos(φ)/sin(φ) and cos(β)/sin(β) are zeroed
— the frame rotation γ remains valid there.

7 FFT tensors are precomputed (3 from scalar basis, 4 from vector
basis) instead of the previous 5.

The R_x(π) equivariance test now runs with all four paths active
in a single module, verifying full equivariance.

Design document updated with the three-angle framework (φ, β, γ=φ−β),
which angle is used where, and updated symmetry properties.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
logs/extract_transcript.py converts Claude Code JSONL session files
into readable transcripts. Takes a session ID and log folder name,
finds the source file under ~/.claude/projects/, and writes both
JSONL (one turn per line) and Markdown (GitHub-renderable with
blockquoted user messages and collapsed tool calls).

Output filenames include the session start timestamp and session ID
prefix for traceability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Session 9c55f925-6cb6-48f1-813b-a63ec61d5191, started 2026-03-26.
Covers design discussion, PR review responses, implementation of
VectorDiscoConvS2 with bearing-angle equivariance, and VectorFilterBasis
refactor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This session covers the evolution from cube-gauged directional filters
to the simpler vector-typed approach using meridian-frame convolution
with per-input-point frame rotation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ShallowWaterStepper uses a VectorDiscoConvS2 layer with fixed weights
to approximate gradient and divergence operators for the linearized
shallow water equations on the sphere:
  dh'/dt = -H0 div(V)
  dV/dt  = -g grad(h') - f x V

Uses nondimensional parameters (g=1, H0=1, omega=0.5) with RK4 time
integration. Coriolis forcing applied pointwise after convolution.

Tests verify:
- Resting state stability
- Mass conservation (<1% drift over 200 steps)
- Energy conservation (ratio ~1.0 over 500 steps)
- Perturbation generates motion
- No NaN blowup over 1000 steps
- Output shape preservation

Video script (scripts/run_shallow_water.py) generates GIF/MP4 of a
Gaussian height perturbation radiating gravity waves.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bearing angle convention (φ=0 → south) means component 0 in
the weight matrix produces the perpendicular gradient (flow along
contours) rather than the true gradient. Switch to component 1
for both the sv (gradient) and vs (divergence) paths.

Also:
- Video script uses fme.core.device.get_device() for GPU acceleration
- Add --lon0 argument for bump center longitude (default 90°)
- Move default bump longitude to 90° for better visualization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ScalarVectorProduct performs pointwise scaling and 90-degree rotation
of vector features by scalar features. Weight shape is (N_s, N_v, 2)
where the last dimension selects scale vs rotate. This is the minimal
frame-consistent bilinear interaction between scalar and vector fields.

The shallow water stepper now uses this module for the Coriolis term
(f × V) instead of manual torch.stack operations.

9 tests for the module cover: shape, zeros, pure scale, pure rotation,
Coriolis pattern, multi-channel summation, magnitude preservation,
and gradient flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mcgibbon
Copy link
Copy Markdown
Contributor Author

The shallow water model produces plausible videos of evolution. For no-rotation they look exactly like what I'd expect. For with-rotation, I have much less intuition on what the solution should look like, but the solution looks plausible.

mcgibbon and others added 11 commits March 27, 2026 20:52
Add section to vector_filter_basis.md describing:
- ScalarVectorProduct motivation (convolution + scalar nonlinearity
  alone cannot represent pointwise scalar×vector products)
- Block structure: conv → activation → MLP → ScalarVectorProduct → residual
- Ordering rationale: MLP before ScalarVectorProduct so enriched
  scalar features drive vector modulation
- What each block can represent (gradient, divergence, Coriolis, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VectorDiscoBlock (block.py) is the repeating unit for scalar-vector
networks: conv → activation → residual MLP → residual ScalarVectorProduct
→ residual. Key design decisions:

- MLP is residual (s + MLP(s)) so zero weights = identity passthrough
- ScalarVectorProduct is residual (v + sv_product(s, v)) so conv
  vector output passes through and scalars add corrections
- activation="none" option for physics-based models

ScalarVectorProduct moved to its own file (scalar_vector_product.py),
test renamed accordingly. Old modules.py removed.

Stepper refactored to use a single VectorDiscoBlock with 2 scalar
channels (h, f) and 1 vector channel (V). Coriolis f is passed
through the conv via W_ss and rotates V via the ScalarVectorProduct.

Energy tests split: exact conservation without rotation (omega=0),
bounded drift with rotation (smoothed f introduces small energy
non-conservation since the Coriolis orthogonality is approximate).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Block changes:
- Add pointwise (delta function) scalar skip: 1x1 conv that adds
  unfiltered scalar input to conv output. Lets features like the
  Coriolis parameter pass through without spatial smoothing.
- Add PointwiseVectorTransform: per-channel scale+rotate for vectors,
  analogous to 1x1 conv for scalars.
- ScalarVectorProduct now acts on INPUT vectors (x_vector), not conv
  output. This ensures exact Coriolis no-work property (f x V · V = 0).

Stepper uses the pointwise scalar skip for f, giving exact energy
conservation with rotation (ratio = 1.000000 over 200 steps).

Tests sped up: smaller grid (16x32 vs 32x64), fewer steps. Suite
runs in ~15s vs ~60s. Energy test now expects exact conservation
with rotation (< 5% drift tolerance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Transcript of the session implementing HybridCoordinateStepper in hybrid
sigma-pressure coordinates, including the sigma_dot bug fix and all tests.

https://claude.ai/code/session_01PvzyiGXcVTKQAnjBVKa1ta
- 2026-03-29T0422-e5d70b83: 2 turns, 50 responses
- 2026-03-29T0422-f2dcec29: 4 turns, 114 responses
- 2026-03-29T1355-122a6ced: 17 turns, 258 responses

https://claude.ai/code/session_01PvzyiGXcVTKQAnjBVKa1ta
W_vs2 adds a diagonal scalar-grad-dot-vector → scalar operator to
VectorDiscoConvS2, enabling exact V·∇s computation without extra DISCO
passes. This is needed for temperature and humidity advection in the
multi-level primitive equations.

VectorDotProduct computes weighted |V|² → scalar (frame-invariant),
used for kinetic energy KE_k = ½|V_k|² in block steppers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the hydrostatic primitive equations on K fixed pressure
levels with frozen VectorDiscoConvS2 weights. Evolves velocity (via
Lamb-form momentum with hydrostatic PGF), temperature (full dry
adiabatic equation with omega vertical velocity), and humidity
(passive tracer advection). Uses RK4 time integration.

Mean-subtraction of geopotential before differentiation eliminates
the O(|phi|*eps) discretization error from grad(const).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Block

Encodes the multi-level primitive equations in a single VectorDiscoBlock
pass by treating levels as channels: scalars are [T, KE, zeta, delta]
per level plus Coriolis f; vectors are V per level. A separate adv_conv
using the W_vs2 pathway handles T and q horizontal advection.

Uses VectorDotProduct for frame-invariant KE computation and frozen
lower-triangular W_sv weights for the hydrostatic PGF.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the primitive equations to sigma coordinates (sigma = p/p_s)
with prognostic surface pressure. Adds p_s tendency from column-
integrated divergence, sigma-dot vertical velocity, and vertical
advection terms. Supports optional hyperdiffusion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the most general vertical coordinate system p_k = a_k +
b_k * p_s, encompassing both pure sigma and pure pressure as special
cases. Adds hybrid vertical velocity C_k, modified PGF correction,
and vertical advection with spatially-varying layer thicknesses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the block stepper approach to hybrid coordinates. Replaces
Python for-loops with frozen Conv2d layers for vertical operations
(hydrostatic recurrence, vertical velocity integration, vertical
finite differences). Only 2 DISCO passes total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcgibbon and others added 13 commits April 15, 2026 17:12
Validates PrimitiveEquationsStepper with a classical warm-dome test:
a temperature anomaly at 40N creates a geopotential gradient that
drives anticyclonic circulation via Coriolis deflection. Demonstrates
hydrostatic vertical coupling (upper levels show stronger response).

Updates shallow_water __init__.py to export all new stepper classes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jablonowski-Williamson 2006 baroclinic instability test using the
SigmaCoordinateStepper. The analytical JW balance does not hold
exactly under the discrete DISCO operators, so the simulation shows
a gravity-wave adjustment phase rather than clean baroclinic wave
development. Kept as a reference for future work on iterative
initialization or spectral filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Describes the progression from single-level shallow water through
isobaric primitive equations, sigma coordinates, and hybrid
coordinates, along with the block stepper variants that encode
multi-level dynamics in a single VectorDiscoBlock pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate all horizontal operations (dynamics, advection, diffusion)
into one DISCO forward pass instead of three, bringing the block stepper
closer to a pure NN architecture.

- Add residual=False option to VectorDiscoBlock
- Expand block scalar channels from 4K+2 to 6K+2 (add T, q per level)
- Encode horizontal advection (-V·∇T, -V·∇q) via W_vs2 in the block
- Encode diffusion (ν∇²) via W_ss (scalars) and W_vv (vectors) using
  Laplacian weights derived from Legendre polynomial eigenvalues
- Remove standalone adv_conv, grad_conv, div_conv and diffusion helpers
- Remove diffusion_order parameter (only order-1 was used)
- Add regression and diffusion-comparison tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same pattern as the hybrid equations block: merge advection and diffusion
into the single VectorDiscoBlock pass.

- Expand from 4K+1 to 6K+1 scalar channels (add T, q per level)
- Move advection from adv_conv to W_vs2 in the block
- Add diffusion via W_ss/W_vv Laplacian weights (reuses helper)
- Use residual=False, remove adv_conv/grad_conv/div_conv/diffusion helpers
- Remove diffusion_order parameter
- Add regression and diffusion-comparison tests
- Update divergence extraction test for new channel layout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add isobaric flag to HybridCoordinateBlockStepper that skips vertical
advection of V/q, PGF correction, and surface pressure tendency.
PrimitiveEquationsBlockStepper now composes a HybridCoordinateBlockStepper
(isobaric=True) and wraps its interface to hide the constant p_s.

- Add isobaric_coefficients() to convert pressure levels to hybrid b=0
- Add isobaric branch in compute_tendencies (simplified omega from C_conv)
- Reduce primitive_equations_block.py from ~290 lines to ~100 lines
- All 78 shallow water tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Blocks

Architecture: scalar encoder (Conv2d) -> vector encoder
(PointwiseVectorTransform) -> N stacked VectorDiscoBlocks ->
scalar/vector decoders. The network is level-agnostic; levels are
packed into channels by the caller.

Zero-initializes decoder, sv_product, MLP output layers, and
pointwise_vv for well-behaved initial forward pass. With 2 residual
blocks, scalar variance stays within ~2x of input; vector variance
runs ~3x due to the DISCO bearing filter contractions not being
variance-preserving (will be addressed in a follow-up commit).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nuation

The DISCO basis is normalized to preserve the mean of constant fields
(basis_norm_mode="mean"), which is correct for encoding physics operators.
But spatial averaging inherently reduces variance (by Cauchy-Schwarz,
filter weights summing to 1 implies sum of squares < 1). The bearing
filters (psi*cos(phi), psi*sin(phi)) attenuate further because the
zero-mean angular modulation causes additional cancellation.

The old init assumed unit-variance filter outputs, under-scaling the
weights. The fix probes each filter contraction with white noise at
construction time to measure the actual attenuation (~0.12x for bearing,
~0.67x for isotropic), then incorporates these factors into the fan-in
computation so the weight magnitude compensates.

Raw conv output is now ~1.35x input variance (was asymmetric before).
Through 2 residual blocks: scalars ~3x, vectors ~6x (cross-talk from
scalar->vector path). Layer norm would address the residual accumulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply ConditionalLayerNorm to scalars after the residual connection in
each block. Only scalars are normalized — vectors pass through
unnormalized so their magnitudes remain physically meaningful. Post-norm
controls total scalar variance by normalizing the accumulated state
(initial + residual), keeping scalars at ~2x regardless of depth. Vector
variance grows linearly (~1x per block from W_sv), which will be
addressed by W_vv damping in a follow-up.

The norm supports conditioning on noise, positional embeddings, and
labels via ContextConfig, matching the existing ConditionalLayerNorm
infrastructure. Physics stepper blocks pass post_norm=False.

With 12 blocks: scalar ~2x, vector ~13x (linear, trainable).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of zero-initializing W_vv, set the diagonal to -0.2 on the
identity-like basis function (k=0, d=0 stretch). With the residual
connection this gives v_new = 0.8*v_old + W_sv_contribution, a
contractive map that bounds vector variance regardless of depth while
preserving ~7% of input signal through 12 blocks.

Combined with scalar post-norm, both paths are well-behaved through
deep stacks: scalars ~2x, vectors ~4x at 12 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scale W_sv to 0.7x its fan-in-scaled value and initialize W_vv as a
mild diagonal contraction (alpha=0.1 on k=0, d=0 basis). With the
residual connection this gives v_new = 0.9*v_old + 0.7*W_sv_contribution,
bounding vector variance to ~4x at 12 blocks while retaining ~28% of
input signal (vs 7% at alpha=0.2). Scalar post-norm keeps scalars at ~2x.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the StepConfigABC/StepABC interface wrapping VectorDiscoNetwork.
Winds are treated as vectors, normalized by a single speed scale (no bias)
derived from the normalizer's per-variable stds. Coriolis and KE are
pre-computed and injected as additional scalar inputs.

Key design:
- Wind variables identified by prefix match (eastward_wind_0, ...)
- wind_scale = mean_over_levels(sqrt(std_u² + std_v²))
- Winds always use residual prediction (input + network output)
- Prognostic scalars use residual when residual_prediction=True
- Diagnostic scalars are always full-field
- Registered as "vector_disco" in StepSelector

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants